[PYTHON] Try to create an execution path diff viewer with angr + bingraphvis

By the way, it was the time of Advent Calendar, so I will give a memorial service to the script I wrote yesterday. This article is the 13th day article of Security Tools Advent Calendar 2018.

Execution path difference viewer is a tool that visualizes the difference of execution path when two inputs are given to the same program. (I thought the name was appropriate now) For example, if you have a program that accepts the letters "AB" like this:

test.c


#include <stdio.h>
#include <stdlib.h>

void one_match() {
        puts("One match");
}

void all_match() {
        puts("Accepted!");
}

int main(int argc, char *argv[]) {
        FILE *fp;
        char buf[32] = {0};
        if (argc < 2) { 
                fprintf(stderr, "usage: ./test <input>\n");
                exit(0);
        }
        fp = fopen(argv[1], "r");
        fread(buf, sizeof(char), 31, fp);
        if (buf[0] == 'A') {
                if (buf[1] == 'B') {
                        all_match();
                        return 0;
                } else {
                        one_match();
                }
        }
        puts("Not good");
        return 0;        
}

Given the string "AX", "Not good" after one_match (), Given the string "AB", all_match (), I want to visualize this in a good way though it goes through different execution paths.

python


$ cat test1.in test2.in
AX
AB
$ ./test test1.in 
One match
Not good
$ ./test test2.in 
Accepted!

angr+Bingraphvis There are two main things I want to do this time. --Record execution traces for each of the two inputs --Display the difference of the obtained trace on CFG in some way

This time I used angr and Bingraphvis. Bingraphvis is a core library that supports angr-utils for visualizing CFG generated by angr. It handles CFG node operations, transformations, and plots. Using this and angr's QEMU Runner (tracer),

  1. Record a trace for two inputs with QEMURunner
  2. Generate CFG after main function with CFGEmulated
  3. Color the CFG node that corresponds to the trace difference with Bingraphviz
  4. Save CFG as png

I did that. The reason I didn't use the wrapper angr-utils was because I wanted to define and use a CFG variant with my own trace diff. If you run it against the program mentioned earlier, it will spit out the image below.

python


$ python3 input-tracer.py -b ./test -i test1.in,test2.in -v
[+] Opening binary ./test
[+] CFG Cache found 
CFG size = 46
[+] Tracing .... test1.in
Size: 46079
[+] Tracing .... test2.in
Size: 46033
[+] CFG processing ....
Graph len= 30
[+] Complete! Saved to outd/input_trace_test_entire.png

input_trace_test_entire.png

The path when red gives test1.in and blue gives test2.in. Common is black.

Plot by function using CFGFast

If you do the above for a program like a UNIX utility, you will plot a huge (5,000 or more nodes) CFG and generate an image of tens of thousands of pixels. Of course, the viewer cannot be displayed and falls, which impairs the meaning of visualization. It is possible to display only the nodes with the difference instead of the entire CFG (without -v from the above command), but the difference alone may be very large. Therefore, we also added a function to plot the CFG by dividing the image for each function of the program.

  1. Record a trace for two inputs with QEMURunner
  2. Get a list of functions defined in the binary with CFGFast of angr.
  3. Perform the same process as above for each function

Enabled with the -f option.

$ python3 ./input-tracer.py -b mp3_player -i invalid.mp3,1.mp3 -f                              
[+] Opening binary mp3_player                                 
[+] Searching for all the functions (using CFGFast)                             
100% |#####################################| Elapsed Time: 0:00:02 Time: 0:00:02
   ==> 106 functions to process.                                                
[+] Tracing .... invalid.mp3                                                    
Size: 1305732                                                                   
[+] Tracing .... 1.mp3                                                          
Size: 6084333                                                                   
[+] CFG processing ....                                                         
[+](0/106) Computing Accurate CFG for function _init (0x8049cd8)               
[+] CFG Cache found                                                             
Graph len= 0                                                                    
[+] Complete! Saved to outd/input_trace_mpg321-0.3.0__init.png                  
[+](1/106) Computing Accurate CFG for function sub_8049d0c (0x8049d0c)         
[+] CFG Cache found  

Give the player clearly invalid mp3 data and valid mp3 data, for example "ABCD". The CFG difference for each function is plotted in outd. input-tracer.png

Looking at the difference of the function in the mp3 player called calc_length, it is as follows.

input_trace_mpg321-0.3.0_calc_length.png

Source code

The code is below https://gist.github.com/RKX1209/3cb60b0fa0ba92da6575716680f32aa0

Recommended Posts

Try to create an execution path diff viewer with angr + bingraphvis
Create an image processing viewer with PySimpleGUI
Try to generate an image with aliasing
Try to create an HTTP server using Node.js
Try to dynamically create a Checkbutton with Python's Tkinter
An easy way to create an import module with jupyter
Minimum Makefile and buildout.cfg to create an environment with buildout
I'm trying to create an authentication / authorization process with Django
[Python] Get the script execution directory with an absolute path
[Python Kivy] How to create an exe file with pyinstaller
I tried to create an article in Wiki.js with SQLAlchemy
Try to factorial with recursion
Create an environment with virtualenv
Create an API with Django
Try an autoencoder with Pytorch
Create Image Viewer with Tkinter
Create an alias for Route53 to CloudFront with the AWS API
Try to create a python environment with Visual Studio Code & WSL
Try to extract a character string from an image with Python3
Rails users try to create a simple blog engine with Django
How to create a heatmap with an arbitrary domain in Python
Try to solve the shortest path with Python + NetworkX + social data
Try to create a Qiita article with REST API [Environmental preparation]
Create folders from '01' to '12' with python
Try to operate Facebook with Python
Create an Excel file with Python3
How to create an email user
Try to profile with ONNX Runtime
Create an age group with pandas
Try to output audio with M5STACK
I tried to create a plug-in with HULFT IoT Edge Streaming [Execution] (3/3)
I want to create an Ubuntu chrome User Profile with Colab only
(Note) A web application that uses TensorFlow to infer recommended song names [Create an execution environment with docker-compose]