Hook to Shared Library on Linux to interrupt the behavior of existing binaries

Overview

There are times when you want to interrupt a Linux system call (such as write) for unavoidable reasons or for interest. In this article, I will briefly explain SharedLib and then explain how to load a dynamic SharedLibrary using dlsym.

Also, if you use LD_PRELOAD, you can overwrite the function of shared lib, but since the original function is directed to other processing, to read the original function, call the original function from within the hooked function. is needed. I will also talk about this technique.

About sharedLib (SKIP possible)

In C language, you can use various functions other than just performing expressions and operations. For example, write (2) and printf (3). These functions are the OS system calls shown in (2) on man or C language standard code such as glibc. Of course, you can use your own code or other libraries.

The C language source code is compiled, and the functions written in it that are divided into multiple codes are linked and become an executable file. By the way, does this file contain the binary code of write (2) and printf (3) itself? The answer is NO (not necessarily NO as it can be compiled statically). So the answer to how a binary of your own code can call those functions is a shared library.

Most of sharedLib is binary code stored in/var/lib with the extension .so, and many programs read the sharedLib required just before execution into the memory space, and refer to and execute it. In UNIX-based OS, it is possible to control the sharedLib loaded by using LD_PRELOAD etc. from the command line. This allows you to change the programs and data read from the same binary code, and change the operation (such as inserting some kind of interrupt).

0: Use of functions divided into multiple files

In C, it is possible to write and compile functions by dividing the source file as shown below.

main.c


#include "lib.h"
int main()
{
  hoge();
}

lib.c


#include <stdio.h>

int static_value = 100;

int hoge(void)
{
  printf("hoge%d\n", static_value);
}

lib.h


int hoge(void);

0.sh


gcc -c lib.c
gcc -o a.out main.c lib.o
ldd a.out
# a.out:
#    libc.so.7 => /lib/libc.so.7 (0x2807d000)
./a.out
# hoge100

hoge () in main () was called. The point is that hoge () in lib.c can properly refer to the static_value defined in the local source code.

1: Use SharedLib

Chapter 0 is an example of statically embedding a library. This chapter uses hoge () as a shared library.

1.sh


# sharedLib lib.create so
gcc -fPIC -o lib.so -shared lib.c
# sharedLib lib.a while referring to so.Create out
gcc -o a.out main.c lib.so
#If you try to execute it as it is, lib.It is said that there is no so
./a.out
# error while loading shared libraries: lib.so
#Specify the directory where sharedLib exists
LD_LIBRARY_PATH=. ./a.out
# hoge100

At first glance, it looks the same as before (in fact, the output is the same). Now, let's rewrite the contents of sharedLib. Delete the lib.so created earlier and regenerate lib.so from another source code.

lib2.c


#include <stdio.h>

int static_value = 1000;

int hoge(void)
{
  printf("This is LIB2! %d\n", static_value);
}

1_2.sh


rm lib.so
gcc -fPIC -o lib.so -shared lib2.c
LD_LIBRARY_PATH=. ./a.out
# This is LIB2! 1000

The point is that the content executed (output) by hoge () has changed even though main.c has not been compiled again . sharedLib reads so (shared object) at run time, not at compile time. This allows you to update only some libraries without affecting the body code or other library code (without having to relink or recompile everything). In other words, you can change the behavior of your program without updating the already compiled executable.

2: Read and rewrite the shareLib symbol from within the program

At runtime, sharedLib is read and recognized by the program using the variable or function as a symbol. It is visible and editable.

main_load.c


#include "lib.h"
#include <stdio.h>
#include <dlfcn.h>

int main()
{
  void *dl_handle;
  int  *value;
  int (*func) (void);
  
  //sharedLib dl_Open as handle
  dl_handle = dlopen("./lib.so", RTLD_NOW);
  // "hoge()"As func(A pointer to the function is returned)
  func = dlsym(dl_handle, "hoge");
  //Variable static_Read value as value(A pointer to a variable is returned)
  value = dlsym(dl_handle, "static_value");
  //Read the value that should be in sharedLib directly!
  printf("value is %d\n", *value);

  hoge();

  //Directly rewrite the value in sharedLib
  *value = 200;
  // hoge()Equivalent to. I'm reading the so function directly
  (*func)();

  dlclose(dl_handle);
}

2.sh


rm lib.so
gcc -fPIC -o lib.so -shared lib.c
gcc main_load.c lib.so -ldl
LD_LIBRARY_PATH=. ./a.out
# value is 100 >I can read the value directly
# hoge100 >This is a normal run
# hoge200 >After rewriting the variables in so, I read the function directly

The behavior is very simple. This program loads lib.so with dlsym (3), which you don't really need to touch in the source code (usually because you specify it at compile time). Then, it refers to and displays static_value, which you shouldn't really care about. Then run hoge () as usual to get the expected output. Then, the static variable static_value embedded in the shared library is changed to 200, and hoge () is executed again by reading the pointer to the function.

First, look at the symbols contained in so. To do this, use nm --list symbols from object files.

[kanai@www:34582]nm  lib.so | egrep (hoge|static_value)
00000000000006b0 T hoge
0000000000201028 D static_value

It means T: text section, D: data section. You can find each symbol from lib.so. sharedLib is loaded at runtime and places the code for each symbol in the appropriate area. That is, hoge () is placed in the text section and static_value is placed in the data section. dlsym returns the address corresponding to the open so symbol. This allowed the func in the text section to be executed and the value in the data section to be referenced and rewritten.

4: SharedLibrary function Hook that should be loaded

This is the main subject of this article. By using LD_PRELOAD, the specified sharedLib can be loaded prior to the normal sharedLib. As will be described later, the point is preceding, and the original sharedLib will also be loaded later. This allows you to insert arbitrary processing into an existing binary system call.

4-1: systemCall override

write_hook_override.c


#include <stdio.h>

size_t write(int d, const void *buf, size_t nbytes)
{
  printf("write called.\n");
}

write_hook_test.c


#include <unistd.h>
int main()
{
  write(0, "hoge\n", 5);
}

Let's run it. After executing write_hook_test, add LD_PRELOAD = ./write_hook.so to it and execute it.

gcc write_hook_test.c
gcc -shared  -o write_hook.so write_hook_override.c
./a.out
# hoge
LD_PRELOAD=./write_hook.so ./a.out
# write called. >The output has changed!

You can see that LD_PRELOAD loaded write_hook.so and replaced the write command with write in write_hook_override!

4-2: Interrupt systemCall

Well, I was able to overwrite the system call. However, in this case, the write of write_hook.so overwrites the function, and the original operation is not performed. (That is, no hoge is output)

Now, let's return to the original processing using RTLD_NEXT.

write_hook_next.c


#include <stdio.h>
#define __USE_GNU
#include <dlfcn.h>

ssize_t write(int d, const void *buf, size_t nbytes)
{
  void *dl_handle;
  int  (*o_write) (int d, const void *buf, size_t nbytes);

  o_write = dlsym(RTLD_NEXT, "write");

  printf("write was called.\n");

  return(o_write(d, buf, nbytes));
}
gcc -shared  -o write_hook_next.so write_hook_next.c -ldl
# write was called.
# hoge

dlsym (RTLD_NEXT, ...) is

Thus, if the function is called from the main program, all the shared libraries are searched.

As you can see, it searches from the sharedLib that should normally be loaded as a result and returns a pointer to it.

In this way, LD_PRELOAD was able to change the behavior of existing binaries. Let's use it in an appropriate range!

If you are more interested, try man 8 ld.so.

appendix

a. For OS X

There is no LD_PRELOAD. Use DYLD_INSERT_LIBRARIES

b. Use of ar

You can also use ar to combine so into one archive.

ar r lib.a lib.o lib.o
nm lib.a
lib.o:
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T hoge
                 U printf
0000000000000000 D static_value

lib.o:
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T hoge
                 U printf
0000000000000000 D static_value

c. Precautions for sudo

Please note that modern sudo is usually env_reset.

$ LD_PRELOAD=./write_hook.so sudo ./a.out
# hoge
$ sudo LD_PRELOAD=./write_hook.so ./a.out
# write was called.
# hoge

d. Can you interrupt static code?

For example, here is the code:

#include <stdio.h>
int static_value = 100;
int hoge(void)
{
  printf("hoge%d\n", static_value);
}
main(){ hoge(); }

Can this hoge be interrupted with LD_PRELOAD? From the conclusion, it is impossible. LD_PRELOAD only interrupts the symbol search of sharedLib (in preparation) at runtime. Therefore, static functions cannot be interrupted. (Because it jmps into the code in the text area that was determined when it was compiled)

Recommended Posts

Hook to Shared Library on Linux to interrupt the behavior of existing binaries
How to limit the API to be published in the C language shared library of Linux
Folding @ Home on Linux Mint to contribute to the analysis of the new coronavirus
The behavior of signal () depends on the compile options
How to access the contents of a Linux disk on a Mac (but read-only)
[2020July] Check the UDID of the iPad on Linux
Preparing to use Ansible on an existing Linux server
[Introduction to Python] Basic usage of the library matplotlib
Announcing the availability of Java 11 LTS on Amazon Linux 2
Put the latest version of Python on linux (Debian) on Chromebook
A note on the default behavior of collate_fn in PyTorch
Inherit the standard library to find the average value of Queue
Behavior when Linux less ends depending on the connection source
Settings to debug the contents of the library with VS Code
Hook to the first import of the module and print the module path
Notes on how to use marshmallow in the schema library
After all, the story of returning from Linux to Windows
Install the latest version of Git on your Linux server
Commands and files to check the version of CentOS Linux
Get the host name of the host PC with Docker on Linux
I will publish a shell script created to reduce the trouble of creating LiveUSB on Linux
Install the JDK on Linux
Paste the link on linux
I tried to create an environment of MkDocs on Amazon Linux
[Linux] I tried to summarize the command of resource confirmation system
Display the image of the camera connected to the personal computer on the GUI.
On Linux, the time stamp of a file is a little past.
[Linux] How to disable the automatic update of the /etc/resolv.conf file (AmazonLinux2)
How to output the output result of the Linux man command to a file
What to do if the inode is exhausted on EC2 Linux
How to use Jupyter on the front end of supercomputer ITO
A command to easily check the speed of the network on the console
How to update the python version of Cloud Shell on GCP
Host the network library Mirror for Unity on a Linux server
[python] A note that started to understand the behavior of matplotlib.pyplot
The story of failing to update "calendar.day_abbr" on the admin screen of django
"Cython" tutorial to make Python explosive: When C ++ code depends on the library. First of all, CMake.