[LINUX] Syzkaller's test automation technology that automatically generates reproduction programs that have contributed to fixing more than 1870 kernel bugs


This article is the 23rd day article of Linux Advent Calendar 2019.


Hello. It is fujiihda of OSS Security Technology Association. He has experience in technical research, technical lectures, development, support, etc. related to OSS including the Linux kernel. Recently, there have been more opportunities to get involved with the founders and operators of the technical community.

Themes covered in this article

I will explain a fuzzing tool called syzkaller (reading: Cisco Cooler [^ 1]) developed by Google's Dmitry Vyukov as a kernel test automation technology and released as OSS. As of December 2019, this article should be the first Japanese article to be investigated by going into internal implementation </ font>.

The content of this article is almost the same as the first half [^ 2] of OSS Security Technology Association 7th Study Group. Strictly speaking, although the know-how obtained as a result of the source code investigation is included, I did not mention the source code itself, and made an article with almost all the explanations and QA that were supplemented verbally.

[^ 1]: I think the reason why z doesn't become dull is because the developer is from Russia.

[^ 2]: I asked uchan_nos to make a note of the content of the day. What does the fuzzing tool syzkaller inspect? Please also refer to it. uchan_nos Thank you!

TL;DR syzkaller has adopted a hybrid approach that leverages source code coverage for test automation, and has also automated a series of steps leading up to bug fixes that were previously handled by humans. As a result, 2 </ font> years have passed since the release, and when 7 </ font>% coverage is covered, We have a track record of contributing to fixing more than "> 1500 </ font> kernel bugs.

syzkaller discovers undiscovered bugs by continuously sending potentially problematic inputs to multiple virtual machines it creates. It repeats retries to reproduce the defect with a minimum of input, and finally tries to generate a C language program to reproduce the defect.

Features are still being added to syzkaller [^ 3], and we are still discovering bugs at this very moment. As of December 2019, it has contributed to more than 1870 </ font> kernel bug fixes. See https://syzkaller.appspot.com/upstream/fixed for the latest status.

[^ 3]: [Added useful functions for syzkaller not only on the syzkaller side but also on the kernel side](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit /? id = 15ff2069cb7f967d). If you are old enough to attend a security camp, you may be able to meet patch contributors.

Main subject

First I tried to move

The result of moving it is as follows. To check the operation, I used a virtual machine [^ 4] that runs on my mobile laptop.

[^ 4]: What I want to say is that you can fully try it without spending a lot of money on a large physical server or bare metal instance. However, when trying in a virtualized environment, enabling Nested virtualization is virtually essential. I haven't tried emulation.

# ./bin/syz-manager -config=my.cfg
2019/06/05 03:53:20 loading corpus...
2019/06/05 03:53:20 serving http on
2019/06/05 03:53:20 serving rpc on tcp://[::]:37545
2019/06/05 03:53:20 booting test machines...
2019/06/05 03:53:20 wait for the connection from test machine...
2019/06/05 03:54:08 machine check:
2019/06/05 03:54:08 syscalls                : 1380/2699
2019/06/05 03:54:08 code coverage           : enabled
2019/06/05 03:54:08 comparison tracing      : CONFIG_KCOV_ENABLE_COMPARISONS is not enabled
2019/06/05 03:54:08 extra coverage          : extra coverage is not supported by the kernel
2019/06/05 03:54:08 setuid sandbox          : enabled
2019/06/05 03:54:08 namespace sandbox       : /proc/self/ns/user does not exist
2019/06/05 03:54:08 Android sandbox         : /sys/fs/selinux/policy does not exist
2019/06/05 03:54:08 fault injection         : CONFIG_FAULT_INJECTION is not enabled
2019/06/05 03:54:08 leak checking           : CONFIG_DEBUG_KMEMLEAK is not enabled
2019/06/05 03:54:08 net packet injection    : /dev/net/tun does not exist
2019/06/05 03:54:08 net device setup        : enabled
2019/06/05 03:54:08 corpus                  : 3844 (0 deleted)
2019/06/05 03:54:10 VMs 4, executed 0, cover 0, crashes 0, repro 0
2019/06/05 03:54:20 VMs 4, executed 36, cover 3836, crashes 0, repro 0
2019/06/05 03:54:30 VMs 4, executed 776, cover 20662, crashes 0, repro 0
2019/06/05 04:10:00 VMs 4, executed 70734, cover 62967, crashes 0, repro 0
2019/06/05 04:10:05 vm-3: crash: no output from test machine
2019/06/05 04:10:10 VMs 3, executed 70918, cover 62967, crashes 1, repro 0
2019/06/05 04:14:02 VMs 4, executed 87377, cover 63959, crashes 1, repro 0
2019/06/05 04:14:05 vm-2: crash: no output from test machine
2019/06/05 04:14:12 VMs 3, executed 87614, cover 63960, crashes 2, repro 0
2019/06/05 04:14:32 VMs 4, executed 87978, cover 63995, crashes 2, repro 0
2019/06/05 04:14:40 vm-3: crash: KASAN: use-after-free Read in blk_mq_free_rqs
2019/06/05 04:14:41 vm-1: running for 20m42.241115632s, restarting
2019/06/05 04:14:41 vm-0: running for 20m33.97704277s, restarting
2019/06/05 04:14:41 vm-2: running for 9.643775512s, restarting
2019/06/05 04:14:42 reproducing crash 'KASAN: use-after-free Read in blk_mq_free_rqs': 1158 programs, 4 VMs, timeouts [15s 1m0s 6m0s]
2019/06/05 04:14:42 VMs 0, executed 87978, cover 63995, crashes 3, repro 1
(The following is omitted)

--Syzkaller is running around 2019/06/05 03:53:20. --In 2019/06/05 04:10:05 and 2019/06/05 04:14:05, we missed the event at the time of the crash. It seems that there are many such things. --I found something that seems to be a problem at 2019/06/05 04:14:40. So far, it's only about 20 minutes from startup. --I am trying to reproduce the event from 2019/06/05 04:14:42.

As a result of the above, the reproduction program automatically generated by syzkaller </ font> is as follows. For the time being, this bug has already been fixed](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c3e2219216c92919a6bd1711f340f5faa98695e6 ).

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

uint64_t r[1] = {0xffffffffffffffff};

int main(void)
    syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
    intptr_t res = 0;
    memcpy((void *)0x20000040, "/dev/loop-control\000", 18);
    res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0x181000, 0);
    if (res != -1)
        r[0] = res;
    syscall(__NR_ioctl, r[0], 0x4c81, 0);
    return 0;

The config file used during the above execution is as follows.

    "target": "linux/amd64",
    "http": "",
    "workdir": "/root/gopath/src/github.com/google/syzkaller/workdir",
    "kernel_obj": "/root/kernel",
    "image": "/root/wheezy.img",
    "sshkey": "/root/wheezy.img.key",
    "syzkaller": "/root/gopath/src/github.com/google/syzkaller",
    "procs": 8,
    "type": "qemu",
    "vm": {
        "count": 4,
        "kernel": "/root/kernel/arch/x86/boot/bzImage",
        "cpu": 4,
        "mem": 4096

Why did you investigate syzkaller?

The answer to the question of why you looked up fuzzing in one word is "it looked interesting." When subdivided, the reasons are as follows.

――I was interested because there was no information in Japanese. ――I was interested in the mechanism, design, and implementation where the latest technology should have been introduced. ――I expected that the way of thinking and technology could be transferred to other fields. --I wanted to improve as a kernel engineer through syzkaller --Study Go language (because I haven't read it before)

Fuzzing and improved security quality for Linux

Fuzzing is a good test method for detecting unknown defects and vulnerabilities. By sending random input data to the inspection target, an exception is intentionally raised. The impact of fuzzing on security quality has become a hot topic in the world, and at Open Source Summit Japan 2019, multiple speakers, including kernel lead maintainer Greg Kroah-Hartman, mentioned the contribution of fuzzing. It was.

--Around 2017, advanced automated test technology such as fuzzing will become widespread. --Security vulnerabilities have been fixed before release, and the number of CVEs is decreasing [^ 5]

The figure shows changes in the number of CVEs for the entire software and Linux.

ソフトウェア全体および Linux の CVE 件数の推移

[^ 5]: The number of CVEs is just an index for measuring the transition, not the number of CVEs = the number of vulnerabilities. Security vulnerabilities without a CVE number ... Stop fooling around

The liver of fuzzing technology

One of the key technologies of advanced fuzzing is a mechanism to generate input data that is likely to cause problems in the inspection target. The sophistication of this mechanism makes it possible to discover defects that are difficult to discover. The results of my arbitrary and prejudiced classification of the mechanism for generating input data for fuzzing are as follows, and I think that syzkaller falls under Level 4. Fuzzing targets range from those that are limited to the network to those that are not.

Table: Classification of mechanisms for generating fuzzing input data

level How it works efficiency Coverage Remarks
1 Completely random without considering protocols etc. × An input pattern that is far from the actual input is generated
2 Modify a part of the captured packet as a template × Only input patterns within the captured range are generated
3 Humans implement and utilize protocol specifications and input patterns that are likely to cause problems Techniques used in major commercial tools
4 Leverage source code coverage in addition to level 3 The method adopted by syzkaller uses the source code as a branching condition when mutating the input.

What is syzkaller

A kernel fuzzing tool developed by Google's Dmitry Vyukov. As I mentioned at the beginning, when 2 </ font> covers 7 </ font>% of coverage in a year, Has a track record of contributing to over> 1500 </ font> kernel bug fixes. The features are as follows.

--Efficient system call sequence generation by hybrid mechanism --The template of the protocol under test (system call) is implemented. --Change the input using the coverage output by the compiler when compiling the source code --Almost all the flow of defect discovery is automated --Continue to send input that may cause problems to multiple virtual machines to be tested that you have generated, and observe their behavior using the debug support mechanism. --Repeat yourself to reproduce the found defect with a minimum of input, and try to generate a C language program to reproduce the defect.

Configuration diagram of syzkaller

I didn't have a good syzkaller block diagram, so I created one.

syzkaller の構成図

First, I will explain only the general processing flow. Details of each component will be explained later in this article.

--The user starts manager on the command line. --The manager launches the fuzzer process via ssh. The manager also manages (starts, monitors, restarts) the VMs under test. Since this VM is also being tested, it will be restarted if anything happens, so it is in an unstable state that can be dropped at any time. In addition, manager also retrieves, creates, and updates files under workdir. --fuzzer is a process that exists inside the VM under test and starts any number of executors specified by the argument when starting syzkaller. --The executor is a process that exists inside any number of VMs under test, receives a sequence of syscalls from fuzzer, executes the sequence of syscalls on the kernel to be fuzzed, and returns the result.

Note that the input interface of a typical application may be an API, but the input interface of the kernel is a system call. System calls are interfaces that connect an application running in user space to the kernel and are used by the application to safely use the functionality of the kernel. It consists of about 300 system calls and their arguments, such as open (), which is used to open a file. In kernel fuzzing, what system calls are selected, what arguments are used, and even more. The point is how to combine and send in what order.

(Reference) What is Sanitizer?

It's not a component of syzkaller, but it's essential to understanding how syzkaller works, so let's take a quick look at Sanitizer. Sanitizer is multiple debugging aids that exist in the kernel.

--Dynamic test tool --Compiler features (available in gcc and C) --A clear indication that memory has been destroyed or violated ――It will show you a problem and will assist you in fuzzing. --If an error occurs, use the setting to cause a kernel panic regardless of the severity of the error (output a panic message to the console) [^ 6]

[^ 6]: Strictly speaking, syzkaller may detect a string such as warning and stop it before the kernel panic.

Here is an example of the Sanitizer used by syzkaller:

  • KASAN (Kernel Address Sanitizer) --Detect memory access error
  • KMSAN (Kernel Memory Sanitizer) --Detect uninitialized reads
  • KTSAN (Kernel Thread Sanitizer) --Detect data race conditions between different threads
  • UBSAN (Undefined Behavior Sanitizer) --Detected the use of features that cause undefined behavior

(Reference) What is KCOV (Kernel Code Coverage)?

Although it is not a component of syzkaller, I will briefly introduce KCOV (Kernel Code Coverage). This is a feature provided by the compiler (gcc). When enabled, it will change from the figure on the left to the figure on the right. It is accompanied by performance deterioration. This must be enabled on the kernel under test.

--Coverage guide Generate kernel code coverage information in a format suitable for fuzzing --How to use: CONFIG_KCOV = y --Condition: gcc 6.1.0 or later (version 231296 or later)

KCOV 無効時と KCOV 有効時のイメージ


--Processes that exist on the virtualization host --Managing VMs to be tested (starting, monitoring, restarting) --These VMs will reboot after every kernel panic --Launch the syz-fuzzer process within the VMs to be tested --Send instructions to the syz-fuzzer process --Updated corpus and crash on workdir (discussed in next section)


workdir / crashes / * and workdir / corpus / *

  • workdir/crashes/* --Crash-related output --A asterisk will generate a folder with the hash value named --Contains the following information in the folder --description Subject to identify the event --logN (N = 0-99) syzkaller log --reportN (N = 0-99) Kernel crash report --repro.cprog C language program for reproduction

    • repro.log
    • repro.prog
    • repro.report
    • repro.stats
    • reproM (M=0~9)
  • workdir/corpus/* --The corpus is a fuzz target input set stored as a separate file --The ideal corpus is a minimal input set that provides maximum code coverage

crashes と corpus


--Processes existing in the unstable tested VMs mentioned above --Generate, mutate, and minimize test cases (inputs) --Launch any number of syz-executor processes --Determine the system call sequence by calling either the Generate function or the Mutate function by conditional branching depending on the presence or absence of the corpus.



--Any number of processes exist in the unstable tested VMs mentioned above --Receives a sequence of syscalls from syz-fuzzer --Execute a sequence of syscalls --Send the execution result of syscalls back to syz-fuzzer --A single executor process is also called Proc (knowledge useful when reading source code) --The data that Proc sends to the kernel is also called ProgData (knowledge useful when reading source code).



In this article, I gave you an overview and general features of syzkaller. The detailed know-how obtained from the source code survey has been omitted, so if you would like to see the details, please refer to page 18 of Study Session Materials See also page 26.

Bonus QA

Q1: What kind of environment does syzkaller support other than Linux? A1: Akaros, Darwin / XNU, FreeBSD, Fuchsia, NetBSD, OpenBSD, Windows, gVisor, etc.

Q2: I want to fuzz an application that runs in user space instead of the kernel. A2: You may be happy with ClusterFuzz. And if you try it, I'd be happy if you could write an article and tell me.

Refusal and request

The content of this article is an individual opinion and does not represent the position, strategy or opinion of the company or community to which it belongs. Also, if you read this article and want to fuzz, please be aware of the following points and do it at your own risk.

--Do not do it on the internal / campus network --Do not do it on an in-house / on-campus machine --"Self-responsibility" in a completely isolated environment such as a private environment

We hope that this article will increase the number of defects found during the pre-release testing phase and improve the quality of the released product as much as possible.