Operate Linux Network Namespace with Go

TL;DR

  1. Go language goroutine switches OS Thread that operates preemptively by default, so it is necessary to `runtime.LockOSThread ()` when performing namespace-related operations of linux that are strongly linked to OS Thread. is there. [^ 1]
  2. If you want to operate Linux network namespace in Go language, it is convenient to use CNI library.

Why are you doing this?

Since managing and costing a VM for each tenant (200 ~) is large, I decided to create a mechanism to provide an HTTP (S) reverse proxy to tenants with conflicting address spaces.

Proof of Concept

Try running the code below.

package main

import (
        "log"
        "net"
        "net/http"
        "os"
        "runtime"

        "github.com/containernetworking/plugins/pkg/ns"
)

func main() {
        nspath := os.Args[1]
        addr := os.Args[2]
        var err error
        var l net.Listener
        ns.WithNetNSPath(nspath, func(_ ns.NetNS) error {
                l, err = net.Listen("tcp", addr)
                return nil
        })
        runtime.UnlockOSThread()
        if err != nil {
                log.Fatal(err)
        }
        if err := http.Serve(l, nil); err != nil {
                log.Fatal(err)
        }
}

To run this code, prepare a container isolated on the network as shown below.

# build binary
go build -o nsproxy nsproxy.go
# setup environment
docker run -d --net none --name pause k8s.gcr.io/pause:3.1
ns=$(docker inspect --format '{{ .NetworkSettings.SandboxKey }}' pause)
# run program
sudo ./nsproxy "$ns" 127.0.0.1:8080 &

When this binary is run, it does not exist in the container's network namaspace (hereinafter referred to as netns) when it is operating as an HTTP server.

# ls -l /proc/1/ns/net #Initial netns information for host
lrwxrwxrwx 1 root root 0 Dec 24 21:42 /proc/1/ns/net -> 'net:[4026531984]'
# ls -l /proc/$(pgrep nsproxy)/task/*/ns/net #The nsproxy process is on the host netns
lrwxrwxrwx 1 root root 0 Dec 24 21:42 /proc/4377/task/4377/ns/net -> 'net:[4026531984]'
lrwxrwxrwx 1 root root 0 Dec 24 21:47 /proc/4377/task/4378/ns/net -> 'net:[4026531984]'
lrwxrwxrwx 1 root root 0 Dec 24 21:47 /proc/4377/task/4379/ns/net -> 'net:[4026531984]'
lrwxrwxrwx 1 root root 0 Dec 24 21:47 /proc/4377/task/4380/ns/net -> 'net:[4026531984]'
lrwxrwxrwx 1 root root 0 Dec 24 21:47 /proc/4377/task/4381/ns/net -> 'net:[4026531984]'
lrwxrwxrwx 1 root root 0 Dec 24 21:47 /proc/4377/task/4382/ns/net -> 'net:[4026531984]'
lrwxrwxrwx 1 root root 0 Dec 24 21:47 /proc/4377/task/4393/ns/net -> 'net:[4026531984]'
# ls -l /proc/$(docker inspect --format '{{.State.Pid}}' pause)/task/*/ns/net #netns information for container
lrwxrwxrwx 1 root root 0 Dec 24 21:50 /proc/3867/task/3867/ns/net -> 'net:[4026532117]'

However, if you use nsenter to enter the netns of the container, you can see that the http server is running at `` `127.0.0.1:8080```.

# nsenter --net=$(docker inspect --format '{{ .NetworkSettings.SandboxKey }}' pause) bash
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
# ss -ltn
State           Recv-Q          Send-Q                   Local Address:Port                     Peer Address:Port
LISTEN          0               128                          127.0.0.1:8080                          0.0.0.0:*
# curl http://127.0.0.1:8080 -v
* Expire in 0 ms for 6 (transfer 0x5627619e7f50)
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5627619e7f50)
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Tue, 24 Dec 2019 12:58:10 GMT
< Content-Length: 19
<
404 page not found
* Connection #0 to host 127.0.0.1 left intact

Try to make it accessible from many containers

Let's see how much this method scales. Expand to have multiple listening ports.

package main

import (
        "log"
        "net"
        "net/http"
        "os"
        "runtime"
        "sync"

        "github.com/containernetworking/plugins/pkg/ns"
)

func main() {
        addr := os.Args[1]
        var ls []net.Listener
        for _, nspath := range os.Args[2:] {
                ns.WithNetNSPath(nspath, func(_ ns.NetNS) error {
                        l, err := net.Listen("tcp", addr)
                        if err != nil {
                                log.Fatal(err)
                        }
                        ls = append(ls, l)
                        return nil
                })
        }
        runtime.UnlockOSThread()
        var wg sync.WaitGroup
        for _, l := range ls {
                wg.Add(1)
                go func(l net.Listener){
                        err := http.Serve(l, nil)
                        if err != nil {
                                log.Print(err)
                        }
                        wg.Done()
                }(l)
        }
        wg.Wait()
}

Prepare about 100 containers as shown below

#Create 100 containers
seq 1000 1999 | xargs -I '{}' -exec docker run -d --net none --name 'pause{}' k8s.gcr.io/pause:3.1
#Listen for 100 containers
sudo ./nsproxy 127.0.0.1:8080 $(docker inspect --format '{{.NetworkSettings.SandboxKey}}' pause{100..199} ) &

State immediately after the process starts operation

$ sudo cat /proc/$(pgrep nsproxy)/status
Name:   nsproxy
Umask:  0022
State:  S (sleeping)
Tgid:   17082
Ngid:   0
Pid:    17082
PPid:   17068
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 128
Groups: 0
NStgid: 17082
NSpid:  17082
NSpgid: 17068
NSsid:  3567
VmPeak:   618548 kB
VmSize:   561720 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     10980 kB
VmRSS:     10980 kB
RssAnon:            6608 kB
RssFile:            4372 kB
RssShmem:              0 kB
VmData:   161968 kB
VmStk:       140 kB
VmExe:      2444 kB
VmLib:      1500 kB
VmPTE:       140 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
Threads:        7
SigQ:   0/15453
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: ffffffffffc1feff
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Speculation_Store_Bypass:       thread vulnerable
Cpus_allowed:   ffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list:      0-239
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        6
nonvoluntary_ctxt_switches:     0

Immediately after the start, it can be seen that RSS is quite lightweight, about 10980 kB.

Summary

It's not scary to touch the network namespace, so please try it. The CNI library itself is lightweight, so be sure to take a look at the implementation itself.

Recommended Posts

Operate Linux Network Namespace with Go
Linux Network Namespace
Operate Db2 container with Go
Network Linux commands
Network Namespace routing
I tried to operate Linux with Discord Bot
Linux (Lubuntu) with OneMix3S
linux with termux app
Operate Blender with Python
Operate Excel with Python (1)
Network (mainly Linux) notes
Operate Excel with Python (2)
Operate Excel with Python openpyxl
Draw Bezier curves with Go
Operate TwitterBot with Lambda, Python
Getting Started with Go Assembly
Self-build linux kernel with clang
Neural network with Python (scikit-learn)
Play around with Linux partitions
3. Normal distribution with neural network!
Bit full search with Go
Connect to Postgresql with GO
[Note] Operate MongoDB with Python
Hot reload with Go + Air
Operate your website with Python_Webbrowser
Linux fastest learning with AWS
Try implementing perfume with Go
Use WDC-433SU2M2 with Manjaro Linux
[Python] [SQLite3] Operate SQLite with Python (Basic)
4. Circle parameters with neural network!
ROS course 105 Operate toio with ROS
Network programming with Python Scapy
Network performance measurement with iperf