When I investigated when troubleshooting the title, I decided to write it because there were few articles that unexpectedly mentioned this matter as critical.
I created a Linux Bridge on CentOS7, assigned a docker container under it, and checked the communication, but I could not communicate.
Linux Bridge operates using a kernel module called bridge, but its security is managed by a kernel module called br_netfilter (dependency with bridge), and br_netfilter controls communication by looking at iptales settings. .. Therefore, communication can be performed by performing any of the following.
① Disable Bridge Netfilter ② Set permission for iptables
OS:CentOS 7.5 Kernel Ver:3.10.0-862.14.4.el7.x86_64 docker Ver:18.06.1-ce
First of all, confirm that the containers can communicate with each other via docker0, which is usually assigned when the docker container is deployed.
Run docker run and deploy two containers.
docker run -d --name cent1 centos/tools:latest /sbin/init
docker run -d --name cent2 centos/tools:latest /sbin/init
Confirm that the container started normally with the docker ps command.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8126f9f72ee2 centos/tools:latest "/sbin/init" 6 seconds ago Up 3 seconds cent2
a957a097b6a5 centos/tools:latest "/sbin/init" About a minute ago Up About a minute cent1
First, check the association between the NIC of the deployed container and the NIC of the docker host. Checking each NIC of the docker container is as follows. eth0 of cent1 is index9 of docker host eth0 of cent2 is index11 of docker host You can see that it is tied to.
docker exec cent1 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
docker exec cent2 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
Checking the NIC of the docker host is as follows.
ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b3:b5:18 brd ff:ff:ff:ff:ff:ff
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
4: vlan10@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:1c:c2:6d:d0 brd ff:ff:ff:ff:ff:ff
9: vethc59a2d1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether f6:1a:1b:00:b9:b5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: vethfee6857@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether 86:45:ea:11:db:35 brd ff:ff:ff:ff:ff:ff link-netnsid 1
Furthermore, if you check the information of Linux Bridge, you can see that veth on the host side of cent1 and 2 is assigned to docker0 as shown below.
brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02421cc26dd0 no vethc59a2d1
vethfee6857
The above can be summarized as shown in the picture below.
If you ping from cent1 to cent2 via docker0 and check the communication, you can communicate normally as follows.
docker exec cent2 ping -c 3 172.17.0.3
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=10.2 ms
64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.048 ms
64 bytes from 172.17.0.3: icmp_seq=3 ttl=64 time=0.045 ms
--- 172.17.0.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.045/3.448/10.252/4.811 ms
Here is the main issue. Next, when you create a new Linux Bridge and assign a docker container, check if it can communicate like docker0.
Create a Bridge named new-bridge1 as the new bridge.
brctl addbr new-bridge1
brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02421cc26dd0 no vethc59a2d1
vethfee6857
new-bridge1 8000.000000000000 no
After creating it, start Bridge as follows.
ip l set dev new-bridge1 up
ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b3:b5:18 brd ff:ff:ff:ff:ff:ff
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
4: vlan10@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:1c:c2:6d:d0 brd ff:ff:ff:ff:ff:ff
9: vethc59a2d1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master new-bridge1 state UP mode DEFAULT group default
link/ether f6:1a:1b:00:b9:b5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: vethfee6857@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master new-bridge1 state UP mode DEFAULT group default
link/ether 86:45:ea:11:db:35 brd ff:ff:ff:ff:ff:ff link-netnsid 1
12: new-bridge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 86:45:ea:11:db:35 brd ff:ff:ff:ff:ff:ff
The NIC of the container deployed by docker (to be exact, the veth corresponding to the container NIC on the docker host side) is in the state assigned to docker0. Exclude these container NICs from docker 0 for verification.
brctl delif docker0 vethc59a2d1
brctl delif docker0 vethfee6857
brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02421cc26dd0 no
new-bridge1 8000.000000000000 no
Assign the container NIC to the newly created new-bridge1.
brctl addif new-bridge1 vethc59a2d1
brctl addif new-bridge1 vethfee6857
brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02421cc26dd0 no
new-bridge1 8000.8645ea11db35 no vethc59a2d1
vethfee6857
By performing the operations up to this point, the state will be as shown in the picture below.
Via the newly created new-bridge1, try pinging from cent1 to cent2 in the same way as docker0 earlier to check communication.
docker exec cent1 ping -c 3 172.17.0.3
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
--- 172.17.0.3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms
Then, unlike the case via docker0 earlier, you can see that there is no communication between cent1 and cent2.
First, try to get tcpdump on each NIC. Two things can be seen from the results below. (1) The ARP request has arrived normally from cent1 to cent2, and cent1 has received the response. ② The ping has reached the Linux Bridge (new-bridge1) but not the cent2
cent1 NIC
tcpdump -i vethc59a2d1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethc59a2d1, link-type EN10MB (Ethernet), capture size 262144 bytes
23:20:39.379638 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 1, length 64
23:20:40.378780 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 2, length 64
23:20:41.378785 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 3, length 64
23:20:44.383711 ARP, Request who-has 172.17.0.3 tell 172.17.0.2, length 28
23:20:44.383744 ARP, Reply 172.17.0.3 is-at 02:42:ac:11:00:03 (oui Unknown), length 28
cent2 NIC
tcpdump -i vethfee6857
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethfee6857, link-type EN10MB (Ethernet), capture size 262144 bytes
23:20:44.383726 ARP, Request who-has 172.17.0.3 tell 172.17.0.2, length 28
23:20:44.383741 ARP, Reply 172.17.0.3 is-at 02:42:ac:11:00:03 (oui Unknown), length 28
new-bridge1
tcpdump -i new-bridge1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on new-bridge1, link-type EN10MB (Ethernet), capture size 262144 bytes
23:20:39.379638 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 1, length 64
23:20:40.378780 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 2, length 64
23:20:41.378785 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 3, length 64
23:20:44.383711 ARP, Request who-has 172.17.0.3 tell 172.17.0.2, length 28
23:20:44.383741 ARP, Reply 172.17.0.3 is-at 02:42:ac:11:00:03 (oui Unknown), length 28
Regarding ①, I will check the ARP cache in each container just in case. You can see that the MAC address of the communication partner is written normally.
docker exec cent1 arp -e
Address HWtype HWaddress Flags Mask Iface
172.17.0.3 ether 02:42:ac:11:00:03 C eth0
gateway (incomplete) eth0
docker exec cent2 arp -e
Address HWtype HWaddress Flags Mask Iface
172.17.0.2 ether 02:42:ac:11:00:02 C eth0
gateway (incomplete) eth0
Linux Bridge operates using a kernel module called bridge, but its security is managed by a kernel module called br_netfilter (dependency with bridge), and br_netfilter controls communication by looking at iptables settings. It seems. Therefore, by default, communication via Bridge is not allowed, and this happens.
$ lsmod | grep br_netfilter
br_netfilter 24576 0
bridge 155648 1 br_netfilter
Communication between containers will be possible by either of the following measures.
Netfilter, which normally controls Linux Bridge communication, is enabled, Communication can be achieved by intentionally disabling this. In addition, enabling / disabling Bridge Netfilter is a kernel parameter net.bridge.bridge-nf-call-iptables. Can be set by.
Check the status of Bridge Netfilter Currently, 1 is set and it is in a valid state.
sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
setting change Set net.bridge.bridge-nf-call-iptables = 0 in /etc/sysctl.conf and Reflect the settings.
cat /etc/sysctl.conf
sysctl settings are defined through files in
/usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
Vendors settings live in /usr/lib/sysctl.d/.
To override a whole file, create a new file with the same in
/etc/sysctl.d/ and put new settings there. To override
only specific settings, add a file with a lexically later
name in /etc/sysctl.d/ and put new settings there.
For more information, see sysctl.conf(5) and sysctl.d(5).
net.bridge.bridge-nf-call-iptables = 0
sysctl -p
net.bridge.bridge-nf-call-iptables = 0
sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 0
Bridge Netfilter refers to iptables to control communication. Therefore, by adding the permission rule to iptables as follows You will be able to communicate via the Linux Bridge. Here, when adding a rule with the iptables command, by specifying a packet matching module called physdev that manages the input and output of the bridge with -m, it is set to allow all communication via Brige.
iptables --Description of system administration commands --List of Linux commands https://kazmax.zpp.jp/cmd/i/iptables.8.html
iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT
iptables -nvL --line-number
Chain INPUT (policy ACCEPT 52 packets, 3250 bytes)
num pkts bytes target prot opt in out source destination
Chain FORWARD (policy DROP 0 packets, 0 bytes)
num pkts bytes target prot opt in out source destination
1 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-is-bridged
2 2006 2508K DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0
3 2006 2508K DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0
4 1126 2451K ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
5 46 5840 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
6 834 51247 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
7 46 5840 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0
(Omitted)
In kubernetes, it is specified that net.bridge.bridge-nf-call-iptables = 1. Because I faced this problem while doing the following validation on kubernetes Added support for iptables rules.
Play with Multus https://rheb.hatenablog.com/entry/multus_introduction
At this point, one question arises: "Why docker0 can communicate even though it is actually the same Linux Bridge?" The answer lies in the iptables settings. docker seems to describe the rules required for installation and docker network creation in iptables. If you check the iptables setting information, you can see that communication from docker0 to the outside and communication via docker0 are ACCEPTed in FRWARD Chain Nos. 5 and 6. In addition, docker also makes NAT settings for iptables.
iptables -nvL --line-number
Chain INPUT (policy ACCEPT 228K packets, 579M bytes)
num pkts bytes target prot opt in out source destination
Chain FORWARD (policy DROP 12 packets, 1008 bytes)
num pkts bytes target prot opt in out source destination
1 9003 12M DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0
2 9003 12M DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0
3 5650 12M ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
4 0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
5 3341 191K ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
6 0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 130K packets, 7700K bytes)
num pkts bytes target prot opt in out source destination
Chain DOCKER (1 references)
num pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
num pkts bytes target prot opt in out source destination
1 3341 191K DOCKER-ISOLATION-STAGE-2 all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
2 9003 12M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 DROP all -- * docker0 0.0.0.0/0 0.0.0.0/0
2 3341 191K RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-USER (1 references)
num pkts bytes target prot opt in out source destination
1 9003 12M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
When using Linux Bridge, it should work only on the L2 layer, so I thought that I should be able to communicate without worrying about anything, but it was a mistake. Linux Bridge also works like L3, including the ability to assign IP. This time I experimented with a container, but I'm assuming that if you assign a KVM virtual machine and a similar problem occurs, this example can solve it. (Not verified)
In conducting this survey, we received various survey cooperation and consultations from the people around us. I would like to take this case and thank you.
KVM bridge network settings https://qiita.com/TsutomuNakamura/items/e15d2c8c02586a7ae572#bridge-%E3%83%88%E3%83%A9%E3%83%95%E3%82%A3%E3%83%83%E3%82%AF%E3%81%AEnetfilter-%E3%82%92%E7%84%A1%E5%8A%B9%E5%8C%96%E3%81%99%E3%82%8B
11.2. Bridge network with libvirt https://docs.fedoraproject.org/ja-JP/Fedora/13/html/Virtualization_Guide/sect-Virtualization-Network_Configuration-Bridged_networking_with_libvirt.html