This article is written as the 8th day article of OthloTech Advent Calendar 2017.
I am in charge of it for two consecutive days from yesterday. Yesterday was about mobile, but today I'd like to change and get closer to the kernel in C. Specifically, we will do Hello World using only the system calls defined in the OS.
__ Target audience __
** Article content **
Operating environment
Let's write Hello World, which seems to be the easiest using C language. It is something that many people have written, such as studying at school.
Hello.c
#include<stdio.h>
int main(){
printf("Hello World!\n");
return 0;
}
I wrote it using print format. You can easily display it just by specifying a character string. However, this function called printf will not work unless stdio.h is included. It's a so-called magical one. The operation inside is not well understood by this alone.
Let's compile and see what the executable looks like. Since it is debugged with gdb, I added the -g option, and since dynamic linking is sluggish, I added -static.
$ gcc Hello.c -static -g
$ gdb -q a.out
(gdb) break main
(gdb) layout asm
(gdb) run
(gdb) si
(gdb) si
...
I tried to step as it is, but it does not end easily. ~~ I'm not motivated until the end ~~ I'm not sure how it works, so I hit strace, a command that enumerates system calls.
$ strace ./a.out
~/hello
execve("./a.out", ["./a.out"], 0x7ffdd8a3ad60 /* 43 vars */) = 0
brk(NULL) = 0x1182000
brk(0x11831c0) = 0x11831c0
arch_prctl(ARCH_SET_FS, 0x1182880) = 0
uname({sysname="Linux", nodename="Juju-62q", ...}) = 0
readlink("/proc/self/exe", "/home/(username)/hello/a.out", 4096) = 23
brk(0x11a41c0) = 0x11a41c0
brk(0x11a5000) = 0x11a5000
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, "Hello World\n", 12Hello World
) = 12
exit_group(0) = ?
+++ exited with 0 +++
It seems that somehow memory allocation and various things are done. However, I feel that write can display a character string. Next, I will use this to output a character string.
When I searched for the write function, it was defined as follows. (/usr/include/unistd.h)
extern ssize_t write (int __fd, const void *__buf, size_t __n) __wur;
fd is a file descriptor. This time, it will be 1 because it is output to the standard output.
buf is the output content and n is the number of characters. It seems to be inconvenient than printf, but it seems that the inconvenience is not convenient, so there are more things that are not used ()
Let's include this.
Write.c
#include<unistd.h>
int main(){
const void *string = "Hello Write!\n";
write(1, string, 13);
return 0;
}
I tried to display a character string on the standard output. Let's make it work in the same way.
$ gcc Write.c -static -g
$ gdb -q a.out
(gdb) break main
(gdb) layout asm
(gdb) run
(gdb) si
(gdb) si
...
This time, I arrived at the display of characters relatively quickly.
Apparently it is displayed when calling syscall.
But I don't like #include <unistd.h>
.
If a character string is displayed by syscall, it seems that you should set it and call syscall, so I investigated how to use syscall.
Apparently, of the registers used by the 64-bit CPU
rax = 1 (specifying that it is a write system call) rdi = 1 (file descriptor, that is, 1 for standard output) rsi = (first address of character string) (specify display character string) rdx = (string length) (number of characters)
It seems that the string is displayed when you call syscall as. Next, let's call syscall directly from the assembler. Then you can say goodbye to the hated unistd.h.
I refer to the following http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
Click here for CPU registers https://software.intel.com/en-us/articles/introduction-to-x64-assembly
When calling in C language, it seems that the above specifications cannot be satisfied unless at least the character string and character length are specified, so I would like to specify the argument. When using arguments with a 64-bit assembler, it seems to be as follows.
rdi first argument rsi second argument rdi 3rd argument rcx 4th argument r8 5th argument r9 6th argument
reference http://p.booklog.jp/book/34047/page/613687
Since there are two arguments this time, we will use rdi and rsi. Then, the first address of the character string is stored in rsi, the number of characters is stored in rdx, and 1 is stored in rax and rdi. Based on the above, I wrote an assembler for nasm. The function name is hello.
syscall.asm
bits 64
global hello
hello:
mov rdx, rsi
mov rsi, rdi
mov rax, 1
mov rdi, 1
syscall
ret
I wrote a C language program using the functions created by the assembler.
main.c
void hello(char *string, int len);
int main (){
char *string = "Hello Asm!\n";
hello(string, 11);
return 0;
}
Finally the include is gone !!! In C language, the prototype of hello is declared and the function is executed. I will compile it. This time, an object file is generated and linked to connect multiple files.
$ nasm -f elf64 -o syscall.o syscall.asm
$ gcc -c main.c
$ gcc main.o syscall.o
$ ./a.out
Hello Asm!
I was able to output the character string I want to output safely!
In nasm, -f elf64
is specified to output the object file for 64bit.
Hello World (provisional) was created just by system call from the OS without using the library safely!
Of course, it's a rudimentary beginning, but I feel like I've learned a little about how to use the OS.
A startup routine that processes main arguments when calling a function and returns at the end is called from the library. I received a comment.
For the time being, I will put the program that does not use the startup routine that I received as it is. I will update it again if I can put it in myself.
$ cat -n main.c
1 void hello(const char*, int);
2 void exit(int) __attribute__((noreturn));
3
4 int main(void){
5 const char* string = "Hello Asm!\n";
6 hello(string, __builtin_strlen(string));
7 exit(0);
8 }
$ cat -n syscall.asm
1 bits 64
2
3 global hello
4
5 hello:
6 mov rdx, rsi
7 mov esi, edi
8 mov eax, 1
9 mov edi, 1
10 syscall
11 ret
12
13 global exit
14
15 exit:
16 mov esi, edi
17 mov eax, 60
18 syscall
$ cat -n makefile
1 target:
2 nasm -f elf64 -o syscall.o syscall.asm
3 gcc -O2 -Wall -Wextra main.c syscall.o -nostdlib -static -Wl,-Map=main.map -Wl,-emain
$ make
nasm -f elf64 -o syscall.o syscall.asm
gcc -O2 -Wall -Wextra main.c syscall.o -nostdlib -static -Wl,-Map=main.map -Wl,-emain
$ ls -l a.out
-rwxrwxrwx 1 user user 1504 Dec 9 00:00 a.out
$ ./a.out
Hello Asm!
$
How was that? ~~ I hope you understand how high-class C language is. ~~ I would be very happy if you could call this article and realize the fun and depth of system calls. There are few people in OthloTech who are doing low layer, so I personally hope that it will increase in the future lol Then everyone, have a good hack life!
Recommended Posts