[LINUX] Follow the communication flow of Docker's bridge connection with nftables

Introduction

I just started studying Docker on CentOS 8. So, I wanted to follow the packet flow of the external host <---> container, so I did a lot of research.

However, I was able to follow the flow of the request, but I could not follow the response. .. .. For the time being, I will summarize only the request route.

Various prerequisites

Verification environment

I am tracing in the following environment.

Docker operating environment

userland proxy(docker-proxy) I didn't use docker-pxory because I wanted to see the basic operation of docker network with iptables / nftables. (Hairpin NAT) The following files are placed and verified.

/etc/docker/daemon.json


{
    "userland-proxy": false
}

Reference: https://github.com/nigelpoulton/docker/blob/master/docs/userguide/networking/default_network/binding.md

docker network and containers As the verification environment, use the one generated by the following entry. The docker host 10.254.10.252 configured with CentOS8, the explanation will be for radius.

** Docker Compose can create network services in 5 minutes (dhcp / radius / proxy / tftp / syslog) **

Screenshot from Gyazo

The following container will be created.

server app address listen
proxy squid 172.20.0.2 8080/tcp
syslog rsyslog 172.20.0.3 514/udp
radius freeRADIUS 172.20.0.4 1812/udp
dhcp ISC-Kea 172.20.0.5 67/udp
tftp tftp-server - 69/udp
# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                    NAMES
b11308767849        infraserv:proxy     "/usr/sbin/init"    3 minutes ago       Up 3 minutes        0.0.0.0:8080->8080/tcp   proxy
33054f8b7d58        infraserv:tftp      "/usr/sbin/init"    35 hours ago        Up 2 hours                                   tftp
851ea861d04e        infraserv:syslog    "/usr/sbin/init"    35 hours ago        Up 2 hours          0.0.0.0:514->514/udp     syslog
dd3a657cfda2        infraserv:dhcp      "/usr/sbin/init"    35 hours ago        Up 2 hours          0.0.0.0:67->67/udp       dhcp
7249b9c4f11d        infraserv:radius    "/usr/sbin/init"    35 hours ago        Up 2 hours          0.0.0.0:1812->1812/udp   radius

A network with the following parameters is generated.

key value
name infraserv_infranet
subnet 172.20.0.0/24
interface docker1

Since tftp operates in the environment of --net = host, docker network is in the following state.

# docker network inspect infraserv_infranet
[
    {
        "Name": "infraserv_infranet",
        "Id": "7ed8face2e4fec3110384fa3366512f8c78db6e10be6e7271b3d92452aefd254",
        "Created": "2020-02-15T05:37:59.248249755-05:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.20.0.0/24",
                    "Gateway": "172.20.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "7249b9c4f11de1f986892965671086d20957a6021269a5f5bc6dd85263bc0d70": {
                "Name": "radius",
                "EndpointID": "03ae6a9b9ff7817eea101955d2d6ff016982beb65c7dd6631c75c7299682c2dd",
                "MacAddress": "02:42:ac:14:00:04",
                "IPv4Address": "172.20.0.4/24",
                "IPv6Address": ""
            },
            "851ea861d04edeb5f5c2498cc60f58532c87a44592db1f6c51280a8ce27940bd": {
                "Name": "syslog",
                "EndpointID": "d18e466d27def913ac74b7555acc9ef79c88c62e62085b50172636546d2e72bb",
                "MacAddress": "02:42:ac:14:00:03",
                "IPv4Address": "172.20.0.3/24",
                "IPv6Address": ""
            },
            "b11308767849c7227fbde53234c1b1816859c8e871fcc98c4fcaacdf7818e89e": {
                "Name": "proxy",
                "EndpointID": "ffa6479b4f28c9c1d106970ffa43bd149461b4728b64290541643eb895a02892",
                "MacAddress": "02:42:ac:14:00:02",
                "IPv4Address": "172.20.0.2/24",
                "IPv6Address": ""
            },
            "dd3a657cfda211c08b7c5c2166f10d189986e4779f1dfea227b3afe284cbafec": {
                "Name": "dhcp",
                "EndpointID": "7371f4cf652d8b1bdbf2dc1e5e8ae97013a9a70b890c2caa36c2a7cc93b165df",
                "MacAddress": "02:42:ac:14:00:05",
                "IPv4Address": "172.20.0.5/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker1"
        },
        "Labels": {
            "com.docker.compose.network": "infranet",
            "com.docker.compose.project": "infraserv",
            "com.docker.compose.version": "1.25.3"
        }
    }
]

Address family

For the sake of brevity, we have focused on IPv4.

About packet flow in Docker

Follow communication (in case of radius)

This time, we will take an example of sending a radius Request from an external terminal (10.254.10.105) to a Docker host (10.254.10.252). Since it is forwarded after it arrives at the local host, the hook of the chain of interest is prerouting-> forward-> postrouting. Therefore, the chain type will be explained focusing on only filter and nat.

The rules exclude unnecessary ones from nft list ruleset, but they are not very useful information, so I summarized them in [Supplement](#Check required tables).

Request from an external terminal (prerouting)

If the hook is prerouting from nft list ruleset, it will be as follows.

         table ip nat {
           chain PREROUTING {
(1)          type nat hook prerouting priority -100; policy accept;
(2)->        fib daddr type local COUNTER jump DOCKER
           }
   ->(2)   chain DOCKER {
      ↓      meta l4proto udp udp dport 514 COUNTER dnat to 172.20.0.3:514
      ↓      meta l4proto udp udp dport 67 COUNTER dnat to 172.20.0.5:67
      ↓      meta l4proto tcp tcp dport 8080 COUNTER dnat to 172.20.0.2:8080
     (3)     meta l4proto udp udp dport 1812 COUNTER dnat to 172.20.0.4:1812
           }
         }

The current communication is 10.254.10.105: random-> 10.254.10.252: 1812. (1) A chain called PREROUTING that hooks prerouting and performs nat is selected. (2) Since DstAddr is local, jump to the chain called DOCKER addr type local is the address of the local host (Docker host in this case). This time it's lo: 127.0.0.1 ʻens192: 10.254.10.252 docker1: 172.20.0.1`. (3) Since DstPort is 1812, ** DNAT DstAddr to 172.20.0.4:1812 ** Apply policy-> ** accept ** because there is no further processing

The communication at this point is 10.254.10.105: random-> 172.20.0.4: 1812. Since the destination has changed to 172.20.0.4, the routing decision will take you to the forward hook.

Request from an external terminal (forward)

Extracting the hook forward from nft list ruleset gives:

                                table ip filter {
                                  chain FORWARD {
(1)                                 type filter hook forward priority 0; policy drop;
(2)->                               COUNTER jump DOCKER-USER
        ->(3)(4)->                  COUNTER jump DOCKER-ISOLATION-STAGE-1
                    ->(5)           oifname "docker1" ct state related,established COUNTER accept
                      (6)->         oifname "docker1" COUNTER jump DOCKER
                                    iifname "docker1" oifname != "docker1" COUNTER accept
                                    iifname "docker1" oifname "docker1" COUNTER accept
                                  }
               ->(4)              chain DOCKER-ISOLATION-STAGE-1 {
                 (5)->              COUNTER return
                                  }
   ->(2)                          chain DOCKER-USER {
     (3)->                          COUNTER return
                                  }
                         ->(6)    chain DOCKER {
                            ↓       iifname != "docker1" oifname "docker1" meta l4proto udp ip daddr 172.20.0.3 udp dport 514 COUNTER accept
                            ↓       iifname != "docker1" oifname "docker1" meta l4proto udp ip daddr 172.20.0.5 udp dport 67 COUNTER accept
                            ↓       iifname != "docker1" oifname "docker1" meta l4proto tcp ip daddr 172.20.0.2 tcp dport 8080 COUNTER accept
                           (7)      iifname != "docker1" oifname "docker1" meta l4proto udp ip daddr 172.20.0.4 udp dport 1812 COUNTER accept
                                  }                          
                                }
                                table inet firewalld {
                                  chain filter_FORWARD {
                           (8)      type filter hook forward priority 10; policy accept;
                            ↓       ct state established,related accept
                           (9)      ct status dnat accept
                                    iifname "lo" accept
                                    jump filter_FORWARD_IN_ZONES
                                    jump filter_FORWARD_OUT_ZONES
                                    ct state invalid drop
                                    reject with icmpx type admin-prohibited
                                  }
                                  chain filter_FORWARD_IN_ZONES {
                                    iifname "ens192" goto filter_FWDI_public
                                    goto filter_FWDI_public
                                  }
                                  chain filter_FORWARD_OUT_ZONES {
                                    oifname "ens192" goto filter_FWDO_public
                                    goto filter_FWDO_public
                                  }
                                  chain filter_FWDI_public { meta l4proto { icmp, ipv6-icmp } accept }
                                  chain filter_FWDO_public { jump filter_FWDO_public_allow }
                                  chain filter_FWDO_public_allow { ct state new,untracked accept }
                                }

The current communication is 10.254.10.105: random-> 172.20.0.4: 1812. (1) Since it has the highest priority among forward hooks, a chain called FORWARD that performs filtering is selected (pri: 0). (2) Unconditionally fly to DOCKER-USER (3) Return without doing anything (4) Unconditionally fly to DOCKER-ISOLATION-STAGE-1 (5) Return without doing anything (6) Since the output IF is docker1, jump to DOCKER (7) Input IF is ens192, output IF is docker1, and DstAddr is 172.20.0.4:1812, so ** accept ** DOCKER in regular chain is called from FORWARD in base chain. When accepted by DOCKER, the caller's FORWARD is evaluated and this chain ends. (8) Since it has the second highest priority among the forward hooks, a chain called filter_FORWARD that performs filtering is selected (pri: 10). (9) Since the packet is DNAT, ** accept ** The communication at this point is the same as the first, 10.254.10.105: random-> 172.20.0.4: 1812.

Request from an external terminal (postrouting)

If the hook is postrouting from nft list ruleset, it will be as follows.

                   table ip nat {
                     chain POSTROUTING {
(1)                    type nat hook postrouting priority 100; policy accept;
 ↓                     oifname "docker1" fib saddr type local COUNTER masquerade
 ↓                     oifname != "docker1" ip saddr 172.20.0.0/24 COUNTER masquerade
 ↓                     meta l4proto udp ip saddr 172.20.0.3 ip daddr 172.20.0.3 udp dport 514 COUNTER masquerade
 ↓                     meta l4proto udp ip saddr 172.20.0.5 ip daddr 172.20.0.5 udp dport 67 COUNTER masquerade
 ↓                     meta l4proto tcp ip saddr 172.20.0.2 ip daddr 172.20.0.2 tcp dport 8080 COUNTER masquerade
 ↓                     meta l4proto udp ip saddr 172.20.0.4 ip daddr 172.20.0.4 udp dport 1812 COUNTER masquerade
                     }
                     table ip firewalld {
                       chain nat_POSTROUTING {
(2)                    type nat hook postrouting priority 110; policy accept;
(3)->                    jump nat_POSTROUTING_ZONES
                       }
   ->(3)               chain nat_POSTROUTING_ZONES {
      ↓                  oifname "ens192" goto nat_POST_public
     (4)->               goto nat_POST_public
                       }
        ->(4)          chain nat_POST_public {
          (5)->          jump nat_POST_public_allow
                       }
             ->(5)     chain nat_POST_public_allow {
               (6)       oifname != "lo" masquerade
                       }
                     }
                   }

The current communication is 10.254.10.105: random-> 172.20.0.4: 1812. (1) Since it has the highest priority among postrouting hooks, a chain called POSTROUTING that performs nat is selected (pri: 100). Apply policy-> ** accept ** because there is no further processing (2) Since it has the second highest priority among postrouting hooks, a chain called nat_POSTROUTING that performs nat is selected (pri: 110). (3) Unconditionally fly to nat_POSTROUTING_ZONES (4) Unconditionally fly to nat_POST_public (5) Unconditionally fly to nat_POST_public_allow (6) Since the output IF is docker1, ** masquerade ** Since the chain ends at the destination called by goto, policy is applied-> ** accept ** The regular chain nat_POST_public_allow is called from the regular chain nat_POST_public. The regular chain nat_POST_public is called by the goto instruction from the regular chain nat_POSTROUTING_ZONES. When the processing of nat_POST_public called by the goto command is completed, the called nat_POSTROUTING_ZONES ends. The nat_POSTROUTING that called it also ends and policy accept is applied.

After being processed by masquerade, the final result is 172.20.0.1: random-> 172.20.0.4: 1812. (Since it is sent from docker1, the source address will be docker1 when processed by masquerade)

Authentication by radius

Requests received by the radius container 172.20.0.1:random --> 172.20.0.4:1812

The radius server checks for availability and returns a response to the radius client.

Response that the radius container replies 172.20.0.4:1812 --> 172.20.0.1:random

Response to external terminals

I'm exhausted. .. .. When I set up a counter with nftables, I saw the address when passing through the following chain. Since it was a one-time authentication exchange, one packet was visible in each chain.

type filter hook prerouting  : 172.20.0.4:1812 --> 172.20.0.1:random
type filter hook input       : 172.20.0.4:1812 --> 10.254.10.105:random
type filter hook forward     : 172.20.0.4:1812 --> 10.254.10.105:random
type filter hook postrouting : 172.20.0.4:1812 --> 10.254.10.105:random

The response from the radius container is 172.20.0.4:1812-> 172.20.0.1: random, When you receive an incoming call, it looks like a communication addressed to you, so you know that you are passing through hook: input. After that, do you go forward through LocalProcess? I'm not sure about this. .. ..

It has become halfway. .. ..

I don't know the route of the response packet from radius. Why doesn't any chain type: nat pass? .. .. Why are you going through hook: input and hook: forward at the same time? .. .. Even though it is in type: filter hook: input pri: -200 of table bridge filter I didn't go into the type: filter hook: input pri: 0 of the table ip filter. Is the L2 bridge and the L3 IP doing different processing?

Source

https://knowledge.sakura.ad.jp/22636/ https://ja.wikipedia.org/wiki/Iptables https://ja.wikipedia.org/wiki/Nftables https://wiki.archlinux.jp/index.php/Nftables https://wiki.archlinux.jp/index.php/Iptables https://wiki.nftables.org/wiki-nftables/index.php/Netfilter_hooks https://www.frozentux.net/iptables-tutorial/iptables-tutorial.html#TRAVERSINGOFTABLES https://wiki.archlinux.jp/index.php/Nftables https://knowledge.sakura.ad.jp/22636/ https://www.codeflow.site/ja/article/a-deep-dive-into-iptables-and-netfilter-architecture

Recommended Posts

Follow the communication flow of Docker's bridge connection with nftables
Summary of the basic flow of machine learning with Python
Visualize the flow rate of tweets with Diamond + Graphite + Grafana
Follow the flow of QAOA (VQE) at the source code level of Blueqat
Follow the file hierarchy with fts
Bookkeeping Learned with Python-The Flow of Bookkeeping-
Edit the file of the SSH connection destination server on the server with VS Code
Align the size of the colorbar with matplotlib
Check the existence of the file with python
The third night of the loop with for
The second night of the loop with for
Count the number of characters with echo