docker 学习笔记5:flannel udp 实现的 ove

2019-12-26  本文已影响0人  董泽润

TL;DR 为了实现 k8s 对网络的要求,可以通过 host-gw 给宿主机添加路由的方式实现,但是仅局限于二层互联的网络,所以一般都用三层的 overlay 网络,本文实验基于 udp 测试 overlay

什么是 flannel

flannel 是一个三层的网络解决方案,每个宿主机上运行一个 flanneld 程序,并且设置 flannel0 网卡,从配置中心 (etcd) 或是 k8s api 获取当前宿主机的容器子网,根据子网设置 docker0 网桥,来实现跨主机的容器网络通信。当然 flannel 只是个框加,背后的具体实现 backend 有很多种:host-gw, vxlan, udp,还有其它云厂商提商的解决方案,比如 alicloud-vpc-backend, aws-vpc-backend, google gce-backend

udp overlay

三层网络的实现有很多种,在不支持 vxlan 的内核中,udp overlay 就是一种实现方式,但是因为性能太差,己经被弃用了,仅用于调试。


udp overlay

上图就是 udp overlay 实现的拓扑结构,网络数据包不会自己飞的,无论是否是 overlay/underlay, 都要老老实实的走过完整的网络协义。比如 172.17.8.2 去 ping 172.17.5.2

实验测试

实验都是手工操作,没有使用 k8s,也没有压测数据,不过从 ping 的结果看,udp 实现的 overlay 网络确实不行,而且 flanneld 进程还是 go 写的大压力情况下应该不行。

1. 启动 etcd

这一点不得不吐槽,flannel 居然用的还是 etcd v2 协义,现在主流都是 v3 了。

/usr/bin/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379

默认 etcd 启动只监听回环接口,所以需要写成 0.0.0.0 或是指定的,然后配置 flannel 大网段

etcdctl set /coreos.com/network/config '{ "Network": "172.17.0.0/16", "Backend": {"Type": "udp"}}'

2. 启动 flanneld

下载 flanneld 当前版本是 0.11.0

wget https://github.com/coreos/flannel/releases/download/v0.11.0/flanneld-amd64 && chmod +x flanneld-amd64

然后启动 flanneld

./flanneld-amd64 -etcd-endpoints=http://192.168.43.161:2379 -etcd-prefix=/coreos.com/network -v=3 -etcd-username="" > /var/log/flanneld 2>&1 &

默认 -etcd-prefix 就是 /coreos.com/network,可以不指定。然后查看启动日志,另外也可以查看 etcd 能看到具体配置,写了哪些数据。

root@ubuntu2:~# tail -f /var/log/flanneld
I1225 09:03:36.654421   23981 main.go:317] Wrote subnet file to /run/flannel/subnet.env
I1225 09:03:36.654441   23981 main.go:321] Running backend.
I1225 09:03:36.654582   23981 udp_network_amd64.go:100] Watching for new subnet leases
I1225 09:03:36.656860   23981 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I1225 09:03:36.656880   23981 iptables.go:167] Deleting iptables rule: -s 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.657763   23981 iptables.go:167] Deleting iptables rule: -d 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.659096   23981 iptables.go:155] Adding iptables rule: -s 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.662668   23981 iptables.go:155] Adding iptables rule: -d 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.667833   23981 udp_network_amd64.go:195] Subnet added: 172.17.5.0/24
I1225 09:03:36.669084   23981 main.go:429] Waiting for 22h59m59.902809261s to renew lease

5. 配置 docker 子网

启动 flanneld 后会看到当前多了一个 flannel0 网卡

root@ubuntu1:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:50:03:fc brd ff:ff:ff:ff:ff:ff
    inet 192.168.43.161/24 brd 192.168.43.255 scope global dynamic enp0s3
       valid_lft 2848sec preferred_lft 2848sec
    inet6 2409:8900:1d61:462e:a00:27ff:fe50:3fc/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 3571sec preferred_lft 3571sec
    inet6 fe80::a00:27ff:fe50:3fc/64 scope link
       valid_lft forever preferred_lft forever
6: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet 172.17.5.0/32 scope global flannel0
       valid_lft forever preferred_lft forever
    inet6 fe80::66f8:e24e:8ce8:24f0/64 scope link stable-privacy
       valid_lft forever preferred_lft forever

然后会生成一个关于本机 docker 子网的配置文件

root@ubuntu1:~# cat /run/flannel/subnet.env
FLANNEL_NETWORK=172.17.0.0/16
FLANNEL_SUBNET=172.17.5.1/24
FLANNEL_MTU=1472
FLANNEL_IPMASQ=false

这个就是当前 docker 子网应配置 docker0 网桥的 ip 设置,官方 flannel github 有一个脚本 mk-docker-opts.sh,可以自行去下载,用这个脚本生成 docker ops 配置

root@ubuntu1:~# ./mk-docker-opts.sh -i
root@ubuntu1:~# cat /run/docker_opts.env
DOCKER_OPT_BIP="--bip=172.17.5.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=true"
DOCKER_OPT_MTU="--mtu=1472"

其实这都是一一对应的,不用脚本自己写也行,但是这个 /run/docker_opts.env 要放到 docker 启动文件里的。

root@ubuntu1:~# cat /lib/systemd/system/docker.service
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
EnvironmentFile=/run/docker_opts.env
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd $DOCKER_OPT_BIP $DOCKER_OPT_IPM
ASQ $DOCKER_OPT_MTU

当前使用 systemctl 来管理服务,所以添加 EnvironmentFile=/run/docker_opts.env 到 [Service] 下面,并且将刚才的配置放到 dockerd 启动命令后面,然后启动 docker,会发现 docker0 ip 己正确设置

root@ubuntu1:~# systemctl start docker
root@ubuntu1:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:50:03:fc brd ff:ff:ff:ff:ff:ff
    inet 192.168.43.161/24 brd 192.168.43.255 scope global dynamic enp0s3
       valid_lft 2603sec preferred_lft 2603sec
    inet6 2409:8900:1d61:462e:a00:27ff:fe50:3fc/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 3239sec preferred_lft 3239sec
    inet6 fe80::a00:27ff:fe50:3fc/64 scope link
       valid_lft forever preferred_lft forever
8: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet 172.17.5.0/32 scope global flannel0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca56:6554:961b:eb4d/64 scope link stable-privacy
       valid_lft forever preferred_lft forever
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:27:60:6b:cd brd ff:ff:ff:ff:ff:ff
    inet 172.17.5.1/24 brd 172.17.5.255 scope global docker0
       valid_lft forever preferred_lft forever

6. 启动容器

分别在两台测试机上启动容器

root@ubuntu2:~# docker run -it myubuntu /bin/bash

7. 容器互 ping

查看当前测试容器 ip

root@f00161eaa2f6:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:08:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.8.2/24 brd 172.17.8.255 scope global eth0
       valid_lft forever preferred_lft forever

测试 ping 本机 docker0 网桥

root@f00161eaa2f6:/# ping 172.17.8.1
PING 172.17.8.1 (172.17.8.1) 56(84) bytes of data.
64 bytes from 172.17.8.1: icmp_seq=1 ttl=64 time=0.122 ms
64 bytes from 172.17.8.1: icmp_seq=2 ttl=64 time=0.051 ms
^C
--- 172.17.8.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.051/0.086/0.122/0.036 ms

测试 ping 本机宿主机 ip

root@f00161eaa2f6:/# ping 192.168.43.222
PING 192.168.43.222 (192.168.43.222) 56(84) bytes of data.
64 bytes from 192.168.43.222: icmp_seq=1 ttl=64 time=0.061 ms
64 bytes from 192.168.43.222: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 192.168.43.222: icmp_seq=3 ttl=64 time=0.047 ms
64 bytes from 192.168.43.222: icmp_seq=4 ttl=64 time=0.045 ms
64 bytes from 192.168.43.222: icmp_seq=5 ttl=64 time=0.060 ms
^C
--- 192.168.43.222 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4083ms
rtt min/avg/max/mdev = 0.044/0.051/0.061/0.009 ms

测试 ping 其它宿主机 ip

root@f00161eaa2f6:/# ping 192.168.43.161
PING 192.168.43.161 (192.168.43.161) 56(84) bytes of data.
64 bytes from 192.168.43.161: icmp_seq=1 ttl=63 time=0.471 ms
64 bytes from 192.168.43.161: icmp_seq=2 ttl=63 time=0.305 ms
64 bytes from 192.168.43.161: icmp_seq=3 ttl=63 time=0.331 ms
64 bytes from 192.168.43.161: icmp_seq=4 ttl=63 time=0.297 ms
64 bytes from 192.168.43.161: icmp_seq=5 ttl=63 time=0.337 ms
^C
--- 192.168.43.161 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms

测试 ping 跨主机容器

rtt min/avg/max/mdev = 0.297/0.348/0.471/0.064 ms
root@f00161eaa2f6:/# ping 172.17.5.2
PING 172.17.5.2 (172.17.5.2) 56(84) bytes of data.
64 bytes from 172.17.5.2: icmp_seq=1 ttl=60 time=0.532 ms

都可以正常工作,说明完成三层 udp overlay 网络搭建,另外从 ping 耗时可以看到,这个架构性能真差。

8. 抓包测试

分别在不同宿主机上,抓 flannel0 网卡数据包和 enp0s3 物理 ip

root@ubuntu2:~# tcpdump -e -n -v -i flannel0
tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes
09:40:28.999156 ip: (tos 0x0, ttl 63, id 15660, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 1, length 64
09:40:28.999920 ip: (tos 0x0, ttl 62, id 52829, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 1, length 64
09:40:29.999859 ip: (tos 0x0, ttl 63, id 15859, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 2, length 64
09:40:30.000353 ip: (tos 0x0, ttl 62, id 52969, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 2, length 64
root@ubuntu1:~# tcpdump -e -n -v -i flannel0
tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes
09:40:28.897354 ip: (tos 0x0, ttl 61, id 15660, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 1, length 64
09:40:28.897380 ip: (tos 0x0, ttl 64, id 52829, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 1, length 64
09:40:29.897971 ip: (tos 0x0, ttl 61, id 15859, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 2, length 64
09:40:29.897989 ip: (tos 0x0, ttl 64, id 52969, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 2, length 64
09:40:30.899467 ip: (tos 0x0, ttl 61, id 16091, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 3, length 64
09:40:30.899500 ip: (tos 0x0, ttl 64, id 53127, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 3, length 64

可以看到,在两台测试机上 flannel0 网卡的表现就像真实互联的一样,其实这部份网络包,就是 flanneld 程序解析后分别转发到 flannel0 网卡上的

root@ubuntu2:~# tcpdump -e -n -v -i enp0s3
09:45:31.075384 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 16455, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:45:31.075872 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 50966, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:45:31.076561 08:00:27:c5:a1:4f > 38:f9:d3:2e:a1:6f, ethertype IPv4 (0x0800), length 166: (tos 0x10, ttl 64, id 62105, offset 0, flags [DF], proto TCP (6), length 152)
09:45:31.076761 38:f9:d3:2e:a1:6f > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 66: (tos 0x48, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
09:45:32.097916 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 16597, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:45:32.098376 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 51097, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:45:32.099075 08:00:27:c5:a1:4f > 38:f9:d3:2e:a1:6f, ethertype IPv4 (0x0800), length 166: (tos 0x10, ttl 64, id 62106, offset 0, flags [DF], proto TCP (6), length 152)
root@ubuntu1:~# tcpdump -e -n -v -i enp0s3
09:44:14.088291 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33452, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:15.114805 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1395, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:44:15.114923 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33515, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:16.132893 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1617, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:44:16.133017 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33662, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:17.156818 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1872, offset 0, flags [DF], proto UDP (17), length 112)

分别抓包物理网卡,可以看到真正的 overlay 数据包是由 flanneld 程序通过 8285 端口传送的。

小结

ks8 网络解决方案 这篇文章有测试,udp overlay 性能是最差的实现方式,所以线上不会使用。接下来测试 vxlan 实现的 overlay

上一篇 下一篇

猜你喜欢

热点阅读