[LINUX] Tetris with bpftrace

Hello everyone. Suddenly, do you know bpftrace? As the name implies, it is a tracing tool that uses bpf. The design of bpftrace is influenced by the pre-existing DTrace and SystemTap. For those familiar with those tools, bpftrace may be easier to understand (albeit very crude) as a BPF version of DTrace or SystemTap.

Generally, when performing tracing using BPF, it is necessary to write 1) the BPF program itself and 2) a userland program for displaying the tracing results, but bpftrace uses its own script language to write them. Tracing processing can be described without any particular awareness. For example, if you want to count the number of system calls issued by process, you can do as follows. It's convenient.

% bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Attaching 1 probe...

@[systemd-journal]: 5
@[sudo]: 7
@[vim]: 12
@[dockerd]: 23
@[bpftrace]: 25
@[sshd]: 26
@[tmux: server]: 29
@[containerd]: 113

Internally, bpftrace parses the script, outputs an LLVM IR, compiles it into a BPF program, and loads it into the kernel. At the same time, the bpftrace program itself becomes the runtime and displays the tracing results.

by the way

In SystemTap, Tetris and other games seem to work. Wouldn't it be wondering if bpftrace could do the same thing?


I ported tetris.stp to bpftrace.



Actually, I haven't moved the original tetris.stp, so I don't know the original behavior, but it works like that, so I'm sure it's OK [^ 0].

[^ 0]: The first execution time is the compile and load time of the BPF program.


Q. What's going on?

By attaching the BPF program to the software event of the cpu clock of perf, the BPF program can be called at regular intervals. In this BPF program, the state of the board saved in the BPF map is updated and the screen is redrawn. To redraw the screen, the BPF program actually outputs a message with perf_event_output (), and the userland bpftrace program that receives the message uses the ANSI escape sequence. Wiki / ANSI_escape_code) is used for drawing.

Q. What is the key input?

In the original implementation kbd_envet () Hook the function and enter the key I got it, but this function seems to respond only to the physical keyboard and could not be used in the ssh environment, so instead [pty_write ()](https://github.com/torvalds/linux/blob/v5 .4 / drivers / tty / pty.c # L111) is hooked with kprobe to capture keystrokes. I don't know if this is the best way.

Q. It doesn't work in my environment

Linux 5.3 or above is required for operation. Otherwise, the BPF program cannot be loaded because it is caught in the upper limit of the number of instructions of the BPF program. Originally, BPF programs had a limit of 4096 instructions per program to guarantee the end of execution for a certain period of time, but since Linux 5.3, privileged users can load programs with up to 1 million instructions [^ 1]. .. For bcc, use the master branch on github, and for bpftrace, use the this branch </ del> master branch on github. Some of the latest functions are used [^ 2].

[^ 1]: It sounds a bit extreme to say that the maximum number of instructions has been raised from 4096 to 1 million, but the reality is not so simple. Whether or not the BPF program can actually be executed depends on whether the verification by the verifier is completed within a certain number, in addition to the upper limit of the number of BPF instructions. For example, a BPF program can transition to another BPF program by tail call, but in this case verifeir checks the BPF instruction including the transition. Therefore, the number of instructions checked by the verifier is greater than the number of instructions in a single BPF program. The upper limit of the number of instructions checked by this verifier (BPF_COMPLEXITY_LIMIT_INSNS) has been raised as the speed of the verifier has increased, and in Linux 5.3 it has reached 1 million. In addition, the program size limit when loading the program has also been raised to BPF_COMPLEXITY_LIMIT_INSNS for privileged users (commit).

[^ 2]: Isn't it supposed to be patched to bpftrace? Please forgive me because it will probably be merged into the main body soon ... </ del> (2019-12-6 postscript: merged In the future, bcc v0.12.0 and bpftrace v0.9.4 and above will be available (both have not been released yet).

Q. I really want to run it on Linux 5.2 or earlier

To work on Linux 5.2 or earlier, you need to reduce the size of the BPF program.

The program created this time is a port of the SystemTap version almost as it is [^ 3], but it is inefficient from the viewpoint of the BPF program. Above all, the state of the squares on the board is saved in the BPF map one by one, but this is a process that requires bpf_map_lookup_elem () every time it is accessed, which is very inefficient. Since any data structure can be stored in the BPF map, the entire board should be saved in one BPF map entry and loaded and stored in a batch. However, currently bpftrace cannot store values other than int in map. Also, unrolling all loop processing of the BPF program is one of the reasons for the increase in the number of instructions. This is because the BPF program containing the loop is played by the verifier. You can use Bounded Loop to loop under certain constraints, but this is also a feature introduced in Linux 5.3 (and bpftrace currently uses bounded loops). Not supported).

There is also a method of chaining BPF programs using bpf tail call as a technique to avoid the upper limit of the number of instructions, but bpftrace does not currently support this either. There is also a special move that divides the BPF program and attaches multiple BPF programs to one event, but when an event drops, the BPF program is not called, making it difficult to guarantee consistent operation.

So, I think it's quite difficult to create Tetris that works properly on Linux 5.2 or earlier, but if you think I'm the one, why not give it a try?

[^ 3]: Only the line elimination algorithm cannot write a loop. It was difficult to write in a BPF program, so I changed it a little.


If you do your best with bpftrace, Tetris will work.

Joking aside, bpftrace is currently under active development for v1.0. Tetris may not work, but it is available in the 4.x kernel. The main functions are complete and can be used practically enough [^ 4], but there may still be problems with the details. If you are interested, please give it a try and report any problems to issue. Here has a set of installation methods for each distribution. For how to use it, refer to Official Tutorial.

Also, just this month, a part of bpftrace is being developed Brendan Gregg's BPF book. Since the e-book version has already been released, I read it briefly, but while focusing on bpftrace as a tool, there is also a small but solid explanation of existing tracing tools (although I do not know from the title) , I thought that the book focused on improving practical performance problems. By the way, there are almost no stories about BPF other than tracing, such as XDP [^ 5], so they will hit different documents [^ 6].

[^ 4]: It seems that Facebook and Netflix are already used in the production environment (cf. https://www.slideshare.net/brendangregg/um2019-bpf-a-new-type-of-software; Since it only says that the BPF program is used, it may be a bcc tool instead of bpftrace)

[^ 5]: There is a whole chapter on network performance analysis.

[^ 6]: I think the explanation of BPF C and the story of the BPF instruction set in the appendix are useful.

Recommended Posts

Tetris with bpftrace