3. daemon详解——网络
docker网络
docker的网络模式:
- 桥接,采用linux bridge + NAT
- host,与主机共享一个network namespace
- none,由用户手动指定
- overlay,在swarm mode下的层2的覆盖网络
另外,一些插件也可以定制docker网络,比如flannel、calico、k8s等等……docker集成了libnetwork作为docker网络的解决方案。
daemon网络配置
打开docker/daemon/config/config.go可以看到一些docker的配置,里面与网络有关的如下:
// commonBridgeConfig stores all the platform-common bridge driver specific
// configuration.
type commonBridgeConfig struct {
Iface string `json:"bridge,omitempty"`
FixedCIDR string `json:"fixed-cidr,omitempty"`
}
// NetworkConfig stores the daemon-wide networking configurations
type NetworkConfig struct {
// Default address pools for docker networks
DefaultAddressPools opts.PoolsOpt `json:"default-address-pools,omitempty"`
}
- docker默认采用linux bridge的interface
- IP分配在固定网段
在config_unix.go里有如下配置:
// BridgeConfig stores all the bridge driver specific
// configuration.
type BridgeConfig struct {
commonBridgeConfig
// These fields are common to all unix platforms.
commonUnixBridgeConfig
// Fields below here are platform specific.
EnableIPv6 bool `json:"ipv6,omitempty"`
EnableIPTables bool `json:"iptables,omitempty"`
EnableIPForward bool `json:"ip-forward,omitempty"`
EnableIPMasq bool `json:"ip-masq,omitempty"`
EnableUserlandProxy bool `json:"userland-proxy,omitempty"`
UserlandProxyPath string `json:"userland-proxy-path,omitempty"`
FixedCIDRv6 string `json:"fixed-cidr-v6,omitempty"`
}
docker daemon的配置为docker container提供了默认的网络环境,container创建时如果不指定网络,默认采用daemon的配置,也可以指定别的网络模式。
由于用户可以在启动daemon的时候指定docker0的信息,daemon启动过程中在NewDaemon()
函数中会调用verifyDaemonSettings()
检查用户输入的flags的合法性,和网络相关的有:
- 检查是否同时指定了-b和-bip两个选项,这两个只能指定一个,如果制定了已经存在一个网桥,就不可能再指定它的IP了
- 是否同时将enableIPTables和ICC都置为false,这是不允许的。因为ICC基于IPTABLES
- 如果enableIPTables且enableIPMasq,将enableIPMasq置为false
之后的代码是一些cgroup和runtime的设定,以后再解析。
// verifyDaemonSettings performs validation of daemon config struct
func verifyDaemonSettings(conf *config.Config) error {
// Check for mutually incompatible config options
if conf.BridgeConfig.Iface != "" && conf.BridgeConfig.IP != "" {
return fmt.Errorf("You specified -b & --bip, mutually exclusive options. Please specify only one")
}
if !conf.BridgeConfig.EnableIPTables && !conf.BridgeConfig.InterContainerCommunication {
return fmt.Errorf("You specified --iptables=false with --icc=false. ICC=false uses iptables to function. Please set --icc or --iptables to true")
}
if !conf.BridgeConfig.EnableIPTables && conf.BridgeConfig.EnableIPMasq {
conf.BridgeConfig.EnableIPMasq = false
}
if err := VerifyCgroupDriver(conf); err != nil {
return err
}
if conf.CgroupParent != "" && UsingSystemd(conf) {
if len(conf.CgroupParent) <= 6 || !strings.HasSuffix(conf.CgroupParent, ".slice") {
return fmt.Errorf("cgroup-parent for systemd cgroup should be a valid slice named as \"xxx.slice\"")
}
}
if conf.DefaultRuntime == "" {
conf.DefaultRuntime = config.StockRuntimeName
}
if conf.Runtimes == nil {
conf.Runtimes = make(map[string]types.Runtime)
}
conf.Runtimes[config.StockRuntimeName] = types.Runtime{Path: DefaultRuntimeName}
return nil
}
之后再daemon.go里会检查bridge network是否启用,DisableNetworkBridge是'none',如果Iface也是none,即用户要求不创建网络,isBridgeNetworkDisabled就为true。
func isBridgeNetworkDisabled(conf *config.Config) bool {
return conf.BridgeConfig.Iface == config.DisableNetworkBridge
}
而如果用户启用了网络配置,在NewDaemon()
中的d.restore()
中会初始化网络环境。因为可能此时已经有些容器之前跑过了,所以daemon会恢复它们的运行环境,会调用initNetworkController()
。
这个函数里的逻辑如下:
- 调用
networkOptions()
生成网络配置,具体信息可以去函数里查看 - 调用
libnetwork.New()
,用生成的netOptions新建libnetwork controller实例,其中包括
id: stringid.GenerateRandomID(), cfg: config.ParseConfigOptions(cfgOptions...), sandboxes: sandboxTable{}, svcRecords: make(map[string]svcInfo), serviceBindings: make(map[serviceKey]*service), agentInitDone: make(chan struct{}), networkLocker: locker.New(), DiagnosticServer: diagnostic.New(),
- type sandbox的定义可以在libnetwork/sandbox.go看到,容器其实就是一种沙盒应用。networkLocker是docker提供的"finer-grained locking",和互斥锁类似。DiagnosticServer用来诊断网络错误(支持注册用户自定义的HTTPHandlerFunc的功能还有待完善)。
之后调用initStores()
初始化data center,目前支持的类型有consul, zookeeper, etcd, boltdb四种,监控整个docker的网络状态。
之后调用drvregistry.New()
新建device registry实例,向registry中添加driver,从getInitializers()
中看到driver类型包括了bridge, host, macvlan, null, remote, overlay,正是docker支持的网络模式。
之后调用initIPAMDrivers()
进行IP address management驱动的初始化,其中会调用SetDefaultIPAddressPool()
进行IP地址的规划。
之后初始化服务发现功能initDiscovery()
,与之前的etcd等data center对应。在新建data center的时候会初始化watcher,这里会调用这个watcher。
调用WalkNetworks(populateSpecial)
,populateSpecial定义了函数NetworkWalker
,其中调用了addNetwork()
,真正的调用底层的库创建了linux bridge。在/vendor/github.com/docker/libnetwork/controller.go中。在addNetwork()的逻辑中,调用了CreateNetwork()
,
之后把之前遗留的sandbox和endpoints清空,把network清空。在初始化网络的时候显然要清理遗留的资源。
- type sandbox的定义可以在libnetwork/sandbox.go看到,容器其实就是一种沙盒应用。networkLocker是docker提供的"finer-grained locking",和互斥锁类似。DiagnosticServer用来诊断网络错误(支持注册用户自定义的HTTPHandlerFunc的功能还有待完善)。
- 根据network name初始化网络,如果是bridge模式,可能需要先调用
removeDefaultBridgeInterface()
删除遗留的网桥。bridge的真正初始化在之后调用的initBridgeDriver()
中。
func (daemon *Daemon) initNetworkController(config *config.Config, activeSandboxes map[string]interface{}) (libnetwork.NetworkController, error) {
netOptions, err := daemon.networkOptions(config, daemon.PluginStore, activeSandboxes)
if err != nil {
return nil, err
}
controller, err := libnetwork.New(netOptions...)
if err != nil {
return nil, fmt.Errorf("error obtaining controller instance: %v", err)
}
if len(activeSandboxes) > 0 {
logrus.Info("There are old running containers, the network config will not take affect")
return controller, nil
}
// Initialize default network on "null"
if n, _ := controller.NetworkByName("none"); n == nil {
if _, err := controller.NewNetwork("null", "none", "", libnetwork.NetworkOptionPersist(true)); err != nil {
return nil, fmt.Errorf("Error creating default \"null\" network: %v", err)
}
}
// Initialize default network on "host"
if n, _ := controller.NetworkByName("host"); n == nil {
if _, err := controller.NewNetwork("host", "host", "", libnetwork.NetworkOptionPersist(true)); err != nil {
return nil, fmt.Errorf("Error creating default \"host\" network: %v", err)
}
}
// Clear stale bridge network
if n, err := controller.NetworkByName("bridge"); err == nil {
if err = n.Delete(); err != nil {
return nil, fmt.Errorf("could not delete the default bridge network: %v", err)
}
if len(config.NetworkConfig.DefaultAddressPools.Value()) > 0 && !daemon.configStore.LiveRestoreEnabled {
removeDefaultBridgeInterface()
}
}
if !config.DisableBridge {
// Initialize default driver "bridge"
if err := initBridgeDriver(controller, config); err != nil {
return nil, err
}
} else {
removeDefaultBridgeInterface()
}
return controller, nil
}
daemon.netController, err = daemon.initNetworkController(daemon.configStore, activeSandboxes)
CreateNetwork()做了什么
-
InitOSContext()
设置namespace -
getNetworks()
获取网络列表 -
newInterface()
新建bridgeInterface实例,其中调用了netlink库的nlh.LinkByName()
将高层配置的网桥指针与底层设备连接起来。 - 新建bridgeNetwork实例,bridge填充为上述新建的bridge interface
- 添加inter-network跨网络通信规则(通过Iptables完成)
-
newBridgeSetup()
准备新建网桥所需要的参数 -
bridgeSetup.apply()
将参数apply到网桥中
func (d *driver) createNetwork(config *networkConfiguration) (err error) {
defer osl.InitOSContext()()
networkList := d.getNetworks()
// Initialize handle when needed
d.Lock()
if d.nlh == nil {
d.nlh = ns.NlHandle()
}
d.Unlock()
// Create or retrieve the bridge L3 interface
bridgeIface, err := newInterface(d.nlh, config)
if err != nil {
return err
}
// Create and set network handler in driver
network := &bridgeNetwork{
id: config.ID,
endpoints: make(map[string]*bridgeEndpoint),
config: config,
portMapper: portmapper.New(d.config.UserlandProxyPath),
bridge: bridgeIface,
driver: d,
}
d.Lock()
d.networks[config.ID] = network
d.Unlock()
// On failure make sure to reset driver network handler to nil
defer func() {
if err != nil {
d.Lock()
delete(d.networks, config.ID)
d.Unlock()
}
}()
// Add inter-network communication rules.
setupNetworkIsolationRules := func(config *networkConfiguration, i *bridgeInterface) error {
if err := network.isolateNetwork(networkList, true); err != nil {
if err = network.isolateNetwork(networkList, false); err != nil {
logrus.Warnf("Failed on removing the inter-network iptables rules on cleanup: %v", err)
}
return err
}
// register the cleanup function
network.registerIptCleanFunc(func() error {
nwList := d.getNetworks()
return network.isolateNetwork(nwList, false)
})
return nil
}
// Prepare the bridge setup configuration
bridgeSetup := newBridgeSetup(config, bridgeIface)
// If the bridge interface doesn't exist, we need to start the setup steps
// by creating a new device and assigning it an IPv4 address.
bridgeAlreadyExists := bridgeIface.exists()
if !bridgeAlreadyExists {
bridgeSetup.queueStep(setupDevice)
}
// Even if a bridge exists try to setup IPv4.
bridgeSetup.queueStep(setupBridgeIPv4)
enableIPv6Forwarding := d.config.EnableIPForwarding && config.AddressIPv6 != nil
// Conditionally queue setup steps depending on configuration values.
for _, step := range []struct {
Condition bool
Fn setupStep
}{
// Enable IPv6 on the bridge if required. We do this even for a
// previously existing bridge, as it may be here from a previous
// installation where IPv6 wasn't supported yet and needs to be
// assigned an IPv6 link-local address.
{config.EnableIPv6, setupBridgeIPv6},
// We ensure that the bridge has the expectedIPv4 and IPv6 addresses in
// the case of a previously existing device.
{bridgeAlreadyExists, setupVerifyAndReconcile},
// Enable IPv6 Forwarding
{enableIPv6Forwarding, setupIPv6Forwarding},
// Setup Loopback Addresses Routing
{!d.config.EnableUserlandProxy, setupLoopbackAddressesRouting},
// Setup IPTables.
{d.config.EnableIPTables, network.setupIPTables},
//We want to track firewalld configuration so that
//if it is started/reloaded, the rules can be applied correctly
{d.config.EnableIPTables, network.setupFirewalld},
// Setup DefaultGatewayIPv4
{config.DefaultGatewayIPv4 != nil, setupGatewayIPv4},
// Setup DefaultGatewayIPv6
{config.DefaultGatewayIPv6 != nil, setupGatewayIPv6},
// Add inter-network communication rules.
{d.config.EnableIPTables, setupNetworkIsolationRules},
//Configure bridge networking filtering if ICC is off and IP tables are enabled
{!config.EnableICC && d.config.EnableIPTables, setupBridgeNetFiltering},
} {
if step.Condition {
bridgeSetup.queueStep(step.Fn)
}
}
// Apply the prepared list of steps, and abort at the first error.
bridgeSetup.queueStep(setupDeviceUp)
return bridgeSetup.apply()
}
bridge driver的初始化过程
在daemon/daemon_unix.go的initBridgeDriver()
中
func initBridgeDriver(controller libnetwork.NetworkController, config *config.Config) error {
bridgeName := bridge.DefaultBridgeName
if config.BridgeConfig.Iface != "" {
bridgeName = config.BridgeConfig.Iface
}
netOption := map[string]string{
bridge.BridgeName: bridgeName,
bridge.DefaultBridge: strconv.FormatBool(true),
netlabel.DriverMTU: strconv.Itoa(config.Mtu),
bridge.EnableIPMasquerade: strconv.FormatBool(config.BridgeConfig.EnableIPMasq),
bridge.EnableICC: strconv.FormatBool(config.BridgeConfig.InterContainerCommunication),
}
// --ip processing
if config.BridgeConfig.DefaultIP != nil {
netOption[bridge.DefaultBindingIP] = config.BridgeConfig.DefaultIP.String()
}
var (
ipamV4Conf *libnetwork.IpamConf
ipamV6Conf *libnetwork.IpamConf
)
ipamV4Conf = &libnetwork.IpamConf{AuxAddresses: make(map[string]string)}
nwList, nw6List, err := netutils.ElectInterfaceAddresses(bridgeName)
if err != nil {
return errors.Wrap(err, "list bridge addresses failed")
}
nw := nwList[0]
if len(nwList) > 1 && config.BridgeConfig.FixedCIDR != "" {
_, fCIDR, err := net.ParseCIDR(config.BridgeConfig.FixedCIDR)
if err != nil {
return errors.Wrap(err, "parse CIDR failed")
}
// Iterate through in case there are multiple addresses for the bridge
for _, entry := range nwList {
if fCIDR.Contains(entry.IP) {
nw = entry
break
}
}
}
ipamV4Conf.PreferredPool = lntypes.GetIPNetCanonical(nw).String()
hip, _ := lntypes.GetHostPartIP(nw.IP, nw.Mask)
if hip.IsGlobalUnicast() {
ipamV4Conf.Gateway = nw.IP.String()
}
if config.BridgeConfig.IP != "" {
ipamV4Conf.PreferredPool = config.BridgeConfig.IP
ip, _, err := net.ParseCIDR(config.BridgeConfig.IP)
if err != nil {
return err
}
ipamV4Conf.Gateway = ip.String()
} else if bridgeName == bridge.DefaultBridgeName && ipamV4Conf.PreferredPool != "" {
logrus.Infof("Default bridge (%s) is assigned with an IP address %s. Daemon option --bip can be used to set a preferred IP address", bridgeName, ipamV4Conf.PreferredPool)
}
if config.BridgeConfig.FixedCIDR != "" {
_, fCIDR, err := net.ParseCIDR(config.BridgeConfig.FixedCIDR)
if err != nil {
return err
}
ipamV4Conf.SubPool = fCIDR.String()
}
if config.BridgeConfig.DefaultGatewayIPv4 != nil {
ipamV4Conf.AuxAddresses["DefaultGatewayIPv4"] = config.BridgeConfig.DefaultGatewayIPv4.String()
}
var deferIPv6Alloc bool
if config.BridgeConfig.FixedCIDRv6 != "" {
_, fCIDRv6, err := net.ParseCIDR(config.BridgeConfig.FixedCIDRv6)
if err != nil {
return err
}
// In case user has specified the daemon flag --fixed-cidr-v6 and the passed network has
// at least 48 host bits, we need to guarantee the current behavior where the containers'
// IPv6 addresses will be constructed based on the containers' interface MAC address.
// We do so by telling libnetwork to defer the IPv6 address allocation for the endpoints
// on this network until after the driver has created the endpoint and returned the
// constructed address. Libnetwork will then reserve this address with the ipam driver.
ones, _ := fCIDRv6.Mask.Size()
deferIPv6Alloc = ones <= 80
if ipamV6Conf == nil {
ipamV6Conf = &libnetwork.IpamConf{AuxAddresses: make(map[string]string)}
}
ipamV6Conf.PreferredPool = fCIDRv6.String()
// In case the --fixed-cidr-v6 is specified and the current docker0 bridge IPv6
// address belongs to the same network, we need to inform libnetwork about it, so
// that it can be reserved with IPAM and it will not be given away to somebody else
for _, nw6 := range nw6List {
if fCIDRv6.Contains(nw6.IP) {
ipamV6Conf.Gateway = nw6.IP.String()
break
}
}
}
if config.BridgeConfig.DefaultGatewayIPv6 != nil {
if ipamV6Conf == nil {
ipamV6Conf = &libnetwork.IpamConf{AuxAddresses: make(map[string]string)}
}
ipamV6Conf.AuxAddresses["DefaultGatewayIPv6"] = config.BridgeConfig.DefaultGatewayIPv6.String()
}
v4Conf := []*libnetwork.IpamConf{ipamV4Conf}
v6Conf := []*libnetwork.IpamConf{}
if ipamV6Conf != nil {
v6Conf = append(v6Conf, ipamV6Conf)
}
// Initialize default network on "bridge" with the same name
_, err = controller.NewNetwork("bridge", "bridge", "",
libnetwork.NetworkOptionEnableIPv6(config.BridgeConfig.EnableIPv6),
libnetwork.NetworkOptionDriverOpts(netOption),
libnetwork.NetworkOptionIpam("default", "", v4Conf, v6Conf, nil),
libnetwork.NetworkOptionDeferIPv6Alloc(deferIPv6Alloc))
if err != nil {
return fmt.Errorf("Error creating default \"bridge\" network: %v", err)
}
return nil
}
代码逻辑:
- 设置网桥名称
- 调用
ElectInterfaceAddresses
查找bridge name的device在宿主机上能否找到。除此之外还要检查用户是否在flags中指定了网桥的IP,将用户指定的IP作为preferred IP - 检测是否指定了网桥IP,之后进行了IPV6、掩码、网关等设置
-
controller.NewNetwork()
新建名为之前设定的bridge name的网桥
restore完成之后的流
initNetworkController()
初始化了网络,规定了网桥IP、掩码、网关等配置信息,返回了netController实例。之后重新加载容器与网络的关系,从checkpoint恢复容器上下文等等。
总之,docke采用libnetwork库封装了docker的网络功能,隔离了docker daemon对netlink库的直接调用。