Write hello world in C

This is the article on the 10th day of Yahoo! JAPAN 18 New Graduate Advent Calendar 2018 created by Yahoo! JAPAN 2018 new graduate volunteers.

I'm Takigadaira, a new graduate of Yahoo! JAPAN 2018. In this article, I will write hello world in C language.

The content of this article has been verified on CentOS 7 and on gcc version 4.8.5.

hello world flow

Generally, when writing hello world in C language, write the following code.

#include <stdio.h>

int main(void) {
  printf("hello world");
  return 0;

This code is compiled on x86_64 GNU / Linux with gcc-4.8.5 by the gcc -O2 -S -o hello.S hello.c command as follows:

	.file	""
	.section	.rodata.str1.1,"aMS",@progbits,1
	.string	"hello world"
	.section	.text.startup,"ax",@progbits
	.p2align 4,,15
	.globl	main
	.type	main, @function
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movl	$.LC0, %edi
	xorl	%eax, %eax
	call	printf
	xorl	%eax, %eax
	addq	$8, %rsp
	.cfi_def_cfa_offset 8
	.size	main, .-main
	.ident	"GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-28)"
	.section	.note.GNU-stack,"",@progbits

The main important parts are:

	movl	$.LC0, %edi
	xorl	%eax, %eax
	call	printf

I have added the address $ .LC0 to the ʻedi register and called the printf` function.

This is because the x86_64 System-V ABI determines that the first argument of a function is stored in the rdi register. (ʻEdiregister is the lower 32 bits ofrdi` register)

** Strong ** to hello world

I can't stand to end up writing an Advent calendar with all my might, so I would like to aim for a stronger ** hello world. What is a "strong hello world"? I wondered if it would be possible to use a string without explicitly stating hello world, which would make it different and stronger. And I wrote the following hello world in an internal article before.

#include <unistd.h>
#include <stdlib.h>

void (*a)(void);
char h = 0x00, e = 0x00, l = 0x00, o = 0x00, w = 0x00, r = 0x00, d = 0x00, sp = 0x00;

void search() {
  while(1) {
    char *b = (char *)a;
    switch(*b) {
      case 0x68:
        h = *b;
      case 0x65:
        e = *b;
      case 0x6c:
        l = *b;
      case 0x6f:
        o = *b;
      case 0x77:
        w = *b;
      case 0x72:
        r = *b;
      case 0x64:
        d = *b;
      case 0x20:
        sp = *b;
    if(h && e && l && o && w && r && d && sp) break;

int main() {
  a = search;
  write(1, &h, 1);
  write(1, &e, 1);
  write(1, &l, 1);
  write(1, &l, 1);
  write(1, &o, 1);
  write(1, &sp, 1);
  write(1, &w, 1);
  write(1, &o, 1);
  write(1, &r, 1);
  write(1, &l, 1);
  write(1, &d, 1);

This is a program that searches for the required characters starting from the pointer of the search function, which has become a machine language, and outputs hello world with the found characters.

Later this code was improved by my boss to something like the following.

#include <unistd.h>

char *a = "";
char h = 0x00, e = 0x00, l = 0x00, o = 0x00, w = 0x00, r = 0x00, d = 0x00, sp = 0x00;

int main() {
  while(1) {
    switch(*a) {
      case 0x68:
        h = *a;
      case 0x65:
        e = *a;
      case 0x6c:
        l = *a;
      case 0x6f:
        o = *a;
      case 0x77:
        w = *a;
      case 0x72:
        r = *a;
      case 0x64:
        d = *a;
      case 0x20:
        sp = *a;
    if(h && e && l && o && w && r && d && sp) break;
  write(1, &h, 1);
  write(1, &e, 1);
  write(1, &l, 1);
  write(1, &l, 1);
  write(1, &o, 1);
  write(1, &sp, 1);
  write(1, &w, 1);
  write(1, &o, 1);
  write(1, &r, 1);
  write(1, &l, 1);
  write(1, &d, 1);

By searching the search direction from the pointer ʻa` in the negative direction, we have succeeded in finding the required character without fail.

The position of the initialized pointer variable is placed after the actual program code, especially if no attributes such as linker script or pragma are specified.

char *a = "";
int main(void) {
  return *a;

If you save such code as test.c and run gcc -o test test.c && objdump -D test to check it, In fact, the main function is located in the .text section, the ʻavariable is located in the.data` section, and their addresses are

00000000004003e0 <main>:
Disassembly of section .data:

0000000000601020 <__data_start>:

0000000000601028 <a>:
  601028:	70 05                	jo     60102f <a+0x7>
  60102a:	40 00 00             	add    %al,(%rax)

You can see that the address 0x400570 pointed to by the pointer ʻa is later than the mainfunction as the address actually expanded into memory (as an example in my environment). Check this value withreadelf`

There are 31 section headers, starting at offset 0x1920:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [13] .plt.got          PROGBITS         00000000004003d0  000003d0
       0000000000000008  0000000000000000  AX       0     0     8
  [14] .text             PROGBITS         00000000004003e0  000003e0
       0000000000000172  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         0000000000400554  00000554
       0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         0000000000400560  00000560
       0000000000000011  0000000000000000   A       0     0     8
  [17] .eh_frame_hdr     PROGBITS         0000000000400574  00000574
       0000000000000034  0000000000000000   A       0     0     4

It points to the inside of .rodata. And since the ASCII code of the character actually required is used in the argument of the comparison instruction in the .text section, the required character is surely aligned. Also, looking at the section information, the sections from .rodata to .text are assigned. Therefore, the program works without SEGV.


But this hello world doesn't look a little weak? This is because each character of hello world is actually included in the actual code.

To be more elaborate, for example, by adding or subtracting the difference of each character from the numerical value corresponding to the hexadecimal representation of an instruction. You should be able to completely expel the characters hello world to create a hello world.

So let's take a look at the x86_64 instruction list.

Consider which of these instructions is definitely included.

When using return 0; at the end of the main function, it is obvious that xor% eax,% eax is included.

--ABI assigns the return value of the function to % eax -[Intel® 64 and IA-32 Architectures Optimization Reference Manual](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization -manual.pdf) by When assigning 0 to an arbitrary register, take the exclusive OR of the registers.

Is decided. Therefore, 0x31 and 0xc0 are candidates.

We also know that the ret instruction is called when the function finishes executing and returns to the caller. So it's safe to assume that 0xc3 also exists.

There are many other candidates and techniques, but let's start with 0xc3.

Say hello to the world

First, find the reference 0xc3.

char *a = "";
int main(void) {
  while(1) {
    if(*a == (char) 0xc3) break;
  char b = *a;
  //Processing after finding
  return 0;

After finding it, we first need to output h, so include the necessary header files. After that, the difference between the ASCII code of 0xc3 and h is calculated and assigned to the output variable as it is.

#include <unistd.h>

char *a = "";

int main(void) {
  while(1) {
    if(*a == (char) 0xc3) break;

  char b = *a;
  b = b + (char)-0x5b;
  write(1, &b, 1);
  return 0;

If you output other characters in this condition, the code will be as follows.

#include <unistd.h>

char *a = "";
char subs[] = {(char)-0x5b, (char)-0x3, (char)0x7, (char)0x0, (char)0x3, (char)-0x4f, (char)0x57, (char)-0x8, (char)0x3, (char)-0x6, (char)-0x8};

int main(void) {
  while(1) {
    if(*a == (char) 0xc3) break;

  char b = *a;
  for(int i=0; i < 11; i++) {
    b = b + subs[i];
    write(1, &b, 1);
  return 0;

Since it is troublesome, I calculated the difference and left it to the for statement, but it became the above form.

To a stronger hello world

The idea of generating a string at runtime is good, but it feels like it's still unsatisfactory. So, this time, I will try to generate hello world by rewriting the binary at runtime.

Output 11 bytes from the top of the function

Let's do it step by step. When rewriting the binary at runtime, if the area that will be executed again is rewritten, problems may occur in future processing. Therefore

  1. Prepare a function for output
  2. Generate hello world by rewriting the already executed 11-byte value from the function top
  3. Execute the write system call with the address of the function top and offset 11 as arguments.

I will output hello world by the method.

x86_64 On Linux, always start a function

pushq %rbp
movq %rsp, %rbp

(Reference: x86 calling convention) Looking at the x86_64 instruction list page earlier, the machine language is

Because the pushq instruction is 0x50 + r (r is the register number)

And next is movq between 64-bit registers

Therefore, it becomes as follows.

55 (pushq %rbp)
48 89 e5 (movq %rsp, %rbp)

Using this, to output 11 bytes (for the number of characters in hello world) from the first address of the function, use this next instruction.

leaq -x(%rip), %rsi

As a matter of fact, it seems that you should put the address in the second argument of the write system call. (x is the difference from the address of the function top from the program counter rip) Refer to man syscall for the correspondence between system call arguments and registers.

And the instruction leaq x (% rip),% rsi is as follows in machine language.

48 8d 35 00 00 00 00

The last 4 bytes are the offset from the specified register (% rip this time). Therefore, if you put about 11 bytes before the value of % rip when this instruction is executed into% rsi, it seems that you can put the pointer of the function top in the second argument. That is, you need leaq -11 (% rip),% rsi, and the resulting machine language you want is something like 48 8d 35 f5 ff ff ff. (F5 ff ff ff is a little endian representation of -11)

me too

You can confirm from that.

Then, specify the write system call number, the output destination file descriptor (first argument), and the number of output bytes (third argument), and call the syscall instruction. Expressed as an assembly, it looks like the following.

movq $1, %rax
movq $1, %rdi
movq $11, %rdx

And these instructions correspond to the following machine language. (Each line corresponds to the instruction on each line above)

48 c7 c0 01 00 00 00
48 c7 c7 01 00 00 00
48 c7 c2 0b 00 00 00
0f 05

These first three lines also correspond to the REX prefix 0x48 and the c7 corresponding to the immediate to register movq and the last 4 bytes corresponding to the immediate value. Considering modR / M for each target register

--% rax is 0b11000000 = 0xc0 -- mod is a register, so 0b11 --reg does not require the value of reg, and the opcode extension value is 0, so 0b000 -- r / m is 0b000 from % rax --% rdi is 011000111 = 0xc7 --Similar to mod and reg, r / m is 0b111 --% rdx is 0b11000010 = 0xc2 --Similar to mod and reg, r / m is 0b010

And syscall is a double-byte instruction that corresponds to 0x0f 0x05.

And at the end of the function, we need to return the value of the % rbp register that was pushedq back to % rbp and back to the calling function.

popq %rbp

You need to run. And these are


Corresponds to the machine language.

In each 1-byte instruction, poq% rbp is 0x58 + r and retq is 0xc3.

If you arrange the instructions so far

pushq %rbp
movq %rsp, %rbp
leaq -11(%rip), %rsi
movq $1, %rax
movq $1, %rdi
movq $11, %rdx
popq %rbp

And the machine language is

48 89 e5
48 8d 35 f5 ff ff ff
48 c7 c0 01 00 00 00
48 c7 c7 01 00 00 00
48 c7 c2 0b 00 00 00
0f 05

It will be shaped like. If you make this a C string literal

char *code = "\x55\x48\x89\xe5\x48\x8d\x35\xf5\xff\xff\xff\x48\xc7\xc0\x01\x00\x00\x00\x48\xc7\xc7\x01\x00\x00\x00\x48\xc7\xc2\x0b\x00\x00\x00\x0f\x05\x5d\xc3";

It will be. And in C, even if it is a character string, it can be executed if it is cast to a function pointer (although it is undefined in the specification).

int main(void) {
  char *code = "\x55\x48\x89\xe5\x48\x8d\x35\xf5\xff\xff\xff\x48\xc7\xc0\x01\x00\x00\x00\x48\xc7\xc7\x01\x00\x00\x00\x48\xc7\xc2\x0b\x00\x00\x00\x0f\x05\x5d\xc3";
  void (*f)(void) = code;
  return 0;

If you compile and run code like this and pipe it to xxd

5548 89e5 488d 35f5 ffff ff

Is obtained, and it can be seen that it matches the machine language actually written.

In other words, if you write an instruction to adjust these 11 bytes after leaq -11 (% rip),% rsi, it seems that you can generate the character hello world.

Adjust 11 bytes from the top of the function

So, adding the following assembly in the right place will solve it.

addl	$19, (%rsi)
addl	$29, 1(%rsi)
addl	$-29, 2(%rsi)
addl	$-121, 3(%rsi)
addl	$39, 4(%rsi)
addl	$-109, 5(%rsi)
addl	$66, 6(%rsi)
addl	$122, 7(%rsi)
addl	$114, 8(%rsi)
addl	$108, 9(%rsi)
addl	$100, 10(%rsi)

And these instructions are in the following machine language:

83 06 13
83 46 01 1d
83 46 02 e3
83 46 03 87
83 46 04 27
83 46 05 93
83 46 06 42
83 46 07 7a
83 46 08 72
83 46 09 6c
83 46 0a 64

ʻAddl $ 19, (% rsi)also corresponds to0x83`, mod R / M

--mod is 0b00 because r / m is the address of the register. -- reg is not used and the opcode extension is / 0, so 0b000 -- r / m is % rsi, so 0b110

From 0x06, then the immediate value is entered, and it becomes 0x13.

Furthermore, when there is a 1-byte offset such as ʻaddl $ 29, 1 (% rsi) , 0x83 of ʻaddl is common.

--mod is 0b01 because r / m is a register + 8-bit offset -- reg is the same, 0b000 -- r / m is also 0b110

From 0x46, then 1 byte is the offset from% rsi, and then 1 byte is the immediate value.

Create machine language for hello world

If you arrange the instructions so far properly

pushq %rbp
movq %rsp, %rbp
leaq -11(%rip), %rsi
addl	$19, (%rsi)
addl	$29, 1(%rsi)
addl	$-29, 2(%rsi)
addl	$-121, 3(%rsi)
addl	$39, 4(%rsi)
addl	$-109, 5(%rsi)
addl	$66, 6(%rsi)
addl	$122, 7(%rsi)
addl	$114, 8(%rsi)
addl	$108, 9(%rsi)
addl	$100, 10(%rsi)
movq $1, %rax
movq $1, %rdi
movq $11, %rdx
popq %rbp

As a machine language

48 89 e5
48 8d 35 f5 ff ff ff
83 06 13
83 46 01 1d
83 46 02 e3
83 46 03 87
83 46 04 27
83 46 05 93
83 46 06 42
83 46 07 7a
83 46 08 72
83 46 09 6c
83 46 0a 64
48 c7 c0 01 00 00 00
48 c7 c7 01 00 00 00
48 c7 c2 0b 00 00 00
0f 05

If you make this a string literal

char *code = "\x55\x48\x89\xe5\x48\x8d\x35\xf5\xff\xff\xff\x83\x06\x13\x83\x46\x01\x1d\x83\x46\x02\xe3\x83\x46\x03\x87\x83\x46\x04\x27\x83\x46\x05\x93\x83\x46\x06\x42\x83\x46\x07\x7a\x83\x46\x08\x72\x83\x46\x09\x6c\x83\x46\x0a\x64\x48\xc7\xc0\x01\x00\x00\x00\x48\xc7\xc7\x01\x00\x00\x00\x48\xc7\xc2\x0b\x00\x00\x00\x0f\x05\x5d\xc3";

It will be.

However, even if it is executed as it is, the area where the variables are stored is not rewritable, so we will devise a little.

#include <sys/mman.h>
#include <string.h>

const unsigned int CODE_LEN = 79;

int main()
  char *code = "\x55\x48\x89\xe5\x48\x8d\x35\xf5\xff\xff\xff\x83\x06\x13\x83\x46\x01\x1d\x83\x46\x02\xe3\x83\x46\x03\x87\x83\x46\x04\x27\x83\x46\x05\x93\x83\x46\x06\x42\x83\x46\x07\x7a\x83\x46\x08\x72\x83\x46\x09\x6c\x83\x46\x0a\x64\x48\xc7\xc0\x01\x00\x00\x00\x48\xc7\xc7\x01\x00\x00\x00\x48\xc7\xc2\x0b\x00\x00\x00\x0f\x05\x5d\xc3";
  memcpy(output, code, CODE_LEN);
  void (*o)(void) = output;
  return 0;

Allocate an area where read / write can be executed with mmap, write code to it, and then execute it. As a result, even if the instruction part is rewritten at the time of execution, it can be executed without a segmentation fault.

Execution example in wandbox is here. You have a solid hello world.


In general, many people are not good at handwriting machine language, so it is unlikely that people will have a turning point to understand machine language, but I hope this article will be a catalyst.

Let's write your own hello world and show off to your friends.

