What's happening when you "go build"?

This article is a translation of How “go build” Works.

How does go build compile the simplest Golang program?

This article aims to answer that question.

Consider the simplest program below.

// main.go
package main

func main() {}

Running go build main.go prints a 1.1Mb executable main and nothing is done. What did go build do to create this do-nothing binary?

The go build command offers some useful options.

  1. -work: go build creates a temporary folder for your work files. This argument prints the location of that folder and does not remove it after build
  2. -a: Golang caches previously built packages. -a causes go build to ignore the cache, so the build prints all steps
  3. -p 1: This sets the process to be done in a single thread and logs the output linearly.
  4. -x: go build is a wrapper for other Golang tools such as compile. -x prints the commands and arguments sent to these tools

Running go build -work -a -p 1 -x main.go produces a lot of logs as well as main, which is what is done when creating main with build. Will tell us.

The log first outputs the following contents.

WORK=/var/folders/rw/gtb29xf92fv23f0zqsg42s840000gn/T/go-build940616988

This is a working directory with a structure similar to the following.

├── b001
│   ├── _pkg_.a
│   ├── exe
│   ├── importcfg
│   └── importcfg.link
├── b002
│   └── ...
├── b003
│   └── ...
├── b004
│   └── ...
├── b006
│   └── ...
├── b007
│   └── ...
└── b008
    └── ...

go build defines an action graph for the task that needs to be completed.

Each action in this graph gets its own subdirectory (defined in NewObjdir).

The first node b001 in the graph is the root task for compiling the main binary.

The number of dependent actions is large, ending with b008. (I don't know where b005 went, but I don't think it's a problem, so I'll omit it.)

b008

The first action to be taken is b008 at the end of the graph.

mkdir -p $WORK/b008/

cat >$WORK/b008/importcfg << 'EOF'
# import config
EOF

cd /<..>/src/runtime/internal/sys

/<..>/compile 
  -o $WORK/b008/_pkg_.a 
  -trimpath "$WORK/b008=>" 
  -p runtime/internal/sys 
  -std 
  -+ 
  -complete 
  -buildid gEtYPexVP43wWYWCxFKi/gEtYPexVP43wWYWCxFKi 
  -goversion go1.14.7 
  -D "" 
  -importcfg $WORK/b008/importcfg 
  -pack 
  -c=16 
  ./arch.go ./arch_amd64.go ./intrinsics.go ./intrinsics_common.go ./stubs.go ./sys.go ./zgoarch_amd64.go ./zgoos_darwin.go ./zversion.go

/<..>/buildid -w $WORK/b008/_pkg_.a

cp $WORK/b008/_pkg_.a /<..>/Caches/go-build/01/01b...60a-d

In b008

  1. Create an action directory (this description is omitted hereafter because all actions do this)
  2. Create a importcfg file for use with the tool compile (empty)
  3. Change directory to the source folder for the runtime / internal / sys package. This package contains constants used at runtime
  4. Compile the package
  5. Use build id to write the metadata to the package ( -w) and copy the package to the go-build cache (all packages are cached, so this description is omitted hereafter) To do)

Let's break this down into the arguments sent to the tool compile (also explained in go tool compile --help).

  1. -o Output destination file
  2. Remove prefix$ WORK / b008 =>from -trimpath source file path
  3. Set the package name used in -p`` import
  4. -std`` compiling standard library (I wasn't sure at this point)
  5. -+ compiling runtime (I didn't know this either)
  6. -complete The compiler outputs a complete package instead of C or assembly
  7. Give the -buildid metadata a build id
  8. -goversion The version required for the compiled package
  9. -D The relative path used for local import is" "
  10. -importcfg Refer to other packages for import configuration file
  11. Create the -pack package as an archive .a instead of the object file .o
  12. -c How much parallel processing should be done at build time
  13. List of files in the package

Most of these arguments are the same for all compile commands, so we'll omit this description below.

The output of b008 is an archive file called $ WORK / b008 / _pkg_.a that corresponds to runtime / internal / sys.

buildid

Let me explain what a build id is.

The format of buildid is<actionid> / <contentid>.

It is used as an index to cache packages and improve the performance of go build.

<actionid> is a hash of the action (all calls, arguments, and input files). <contentid> is the hash of the output .a file.

For each go build action, you can search the cache for content created by another action with the same<actionid>.

This is implemented in buildid.go.

The buildid is stored in a file as metadata, so you don't have to hash it every time to get the<contentid>. You can find this ID with go tool buildid <file> (it also works in binary).

In the b008 log above, the buildID is set by the compile tool as gEtYPexVP43wWYWCxFKi / gEtYPexVP43wWYWCxFKi.

This is just a placeholder and will be overwritten with the correct gEtYPexVP43wWYWCxFKi / b-rPboOuD0POrlJWPTEi with go tool buildid -w before it is cached later.

b007

Next is b007

cat >$WORK/b007/importcfg << 'EOF'
# import config
packagefile runtime/internal/sys=$WORK/b008/_pkg_.a
EOF

cd /<..>/src/runtime/internal/math

/<..>/compile 
  -o $WORK/b007/_pkg_.a 
  -p runtime/internal/math 
  -importcfg $WORK/b007/importcfg 
  ...
  ./math.go
  1. I am creating a importcfg that says packagefile runtime / internal / sys = $ WORK / b008 / _pkg_.a This indicates that b007 depends on b008
  2. Compile runtime / internal / math If you take a look inside math.go, you are surely importing runtime / internal / sys made with b008.

The output of b007 is an archive file called $ WORK / b007 / _pkg_.a that corresponds to runtime / internal / math.

b006

cat >$WORK/b006/go_asm.h << 'EOF'
EOF

cd /<..>/src/runtime/internal/atomic

/<..>/asm 
  -I $WORK/b006/ 
  -I /<..>/go/1.14.7/libexec/pkg/include 
  -D GOOS_darwin 
  -D GOARCH_amd64 
  -gensymabis 
  -o $WORK/b006/symabis 
  ./asm_amd64.s

/<..>/asm 
  -I $WORK/b006/ 
  -I /<..>/go/1.14.7/libexec/pkg/include 
  -D GOOS_darwin 
  -D GOARCH_amd64 
  -o $WORK/b006/asm_amd64.o 
  ./asm_amd64.s

cat >$WORK/b006/importcfg << 'EOF'
# import config
EOF

/<..>/compile 
  -o $WORK/b006/_pkg_.a 
  -p runtime/internal/atomic 
  -symabis $WORK/b006/symabis 
  -asmhdr $WORK/b006/go_asm.h 
  -importcfg $WORK/b006/importcfg
  ...
  ./atomic_amd64.go ./stubs.go

/<..>/pack r $WORK/b006/_pkg_.a $WORK/b006/asm_amd64.o

Now let's break out of the regular .go file and start processing the low level Go assembly .s file.

  1. Create a header file go_asm.h
  2. Move to the runtime / internal / atomic package of low-level functions
  3. Run the tool go tool asm (described in go tool asm --help) to create the symabis" Symbol Application Binary Interfaces (ABI) file "and then the object fileasm_amd64.o Create
  4. Use compile to create a _pkg_.a file containing a symabis file and a header containing -asmhdr
  5. Add asm_amd64.o to _pkg_.a with the pack command

The asm tool is called here with the following arguments:

  1. Include the -I: actions b007 and the libexec / pkg / includes folders. includes has three files asm_ppc64x.h, funcdata.h and textflag.h, all with low-level function definitions. For example, FIXED_FRAME defines the size of the fixed part of the stack frame.
  2. -D: Includes predefined symbols
  3. -gensymabis: Create a symabis file
  4. -o: Output destination file

The output of b006 is an archive file called $ WORK / b006 / _pkg_.a that corresponds to runtime / internal / atomic.

b004

cd /<..>/src/internal/cpu

/<..>/asm ... -o $WORK/b004/symabis ./cpu_x86.s
/<..>/asm ... -o $WORK/b004/cpu_x86.o ./cpu_x86.s

/<..>/compile ... -o $WORK/b004/_pkg_.a ./cpu.go ./cpu_amd64.go ./cpu_x86.go

/<..>/pack r $WORK/b004/_pkg_.a $WORK/b004/cpu_x86.o

b004 is the same as b006 except that the target is changed to internal / cpu.

First create the symabis and object files by assembling cpu_x86.s, compile the go file, and then combine them to create the archive _pkg_.a.

The output of b004 is an archive file called $ WORK / b004 / _pkg_.a that corresponds to internal / cpu.

b003

cat >$WORK/b003/go_asm.h << 'EOF'
EOF

cd /<..>/src/internal/bytealg

/<..>/asm ... -o $WORK/b003/symabis ./compare_amd64.s ./count_amd64.s ./equal_amd64.s ./index_amd64.s ./indexbyte_amd64.s

cat >$WORK/b003/importcfg << 'EOF'
# import config
packagefile internal/cpu=$WORK/b004/_pkg_.a
EOF

/<..>/compile ... -o $WORK/b003/_pkg_.a -p internal/bytealg ./bytealg.go ./compare_native.go ./count_native.go ./equal_generic.go ./equal_native.go ./index_amd64.go ./index_native.go ./indexbyte_native.go

/<..>/asm ... -o $WORK/b003/compare_amd64.o ./compare_amd64.s
/<..>/asm ... -o $WORK/b003/count_amd64.o ./count_amd64.s
/<..>/asm ... -o $WORK/b003/equal_amd64.o ./equal_amd64.s
/<..>/asm ... -o $WORK/b003/index_amd64.o ./index_amd64.s
/<..>/asm ... -o $WORK/b003/indexbyte_amd64.o ./indexbyte_amd64.s

/<..>/pack r $WORK/b003/_pkg_.a $WORK/b003/compare_amd64.o $WORK/b003/count_amd64.o $WORK/b003/equal_amd64.o $WORK/b003/index_amd64.o $WORK/b003/indexbyte_amd64.o

Doing b003 is the same as b004 and b006.

The main problem with this package is that there are multiple .s files to create many object files .o, each of which needs to be added to the _pkg_.a file.

The output of b003 is an archive file called $ WORK / b003 / _pkg_.a that corresponds to internal / bytealg.

b002

cat >$WORK/b002/go_asm.h << 'EOF'
EOF

cd /<..>/src/runtime

/<..>/asm 
  ... 
  -o $WORK/b002/symabis 
  ./asm.s ./asm_amd64.s ./duff_amd64.s ./memclr_amd64.s ./memmove_amd64.s ./preempt_amd64.s ./rt0_darwin_amd64.s ./sys_darwin_amd64.s
  
cat >$WORK/b002/importcfg << 'EOF'
# import config
packagefile internal/bytealg=$WORK/b003/_pkg_.a
packagefile internal/cpu=$WORK/b004/_pkg_.a
packagefile runtime/internal/atomic=$WORK/b006/_pkg_.a
packagefile runtime/internal/math=$WORK/b007/_pkg_.a
packagefile runtime/internal/sys=$WORK/b008/_pkg_.a
EOF

/<..>/compile 
  -o $WORK/b002/_pkg_.a 
  ...
  -p runtime 
  ./alg.go ./atomic_pointer.go ./cgo.go ./cgocall.go ./cgocallback.go ./cgocheck.go ./chan.go ./checkptr.go ./compiler.go ./complex.go ./cpuflags.go ./cpuflags_amd64.go ./cpuprof.go ./cputicks.go ./debug.go ./debugcall.go ./debuglog.go ./debuglog_off.go ./defs_darwin_amd64.go ./env_posix.go ./error.go ./extern.go ./fastlog2.go ./fastlog2table.go ./float.go ./hash64.go ./heapdump.go ./iface.go ./lfstack.go ./lfstack_64bit.go ./lock_sema.go ./malloc.go ./map.go ./map_fast32.go ./map_fast64.go ./map_faststr.go ./mbarrier.go ./mbitmap.go ./mcache.go ./mcentral.go ./mem_darwin.go ./mfinal.go ./mfixalloc.go ./mgc.go ./mgcmark.go ./mgcscavenge.go ./mgcstack.go ./mgcsweep.go ./mgcsweepbuf.go ./mgcwork.go ./mheap.go ./mpagealloc.go ./mpagealloc_64bit.go ./mpagecache.go ./mpallocbits.go ./mprof.go ./mranges.go ./msan0.go ./msize.go ./mstats.go ./mwbbuf.go ./nbpipe_pipe.go ./netpoll.go ./netpoll_kqueue.go ./os_darwin.go ./os_nonopenbsd.go ./panic.go ./plugin.go ./preempt.go ./preempt_nonwindows.go ./print.go ./proc.go ./profbuf.go ./proflabel.go ./race0.go ./rdebug.go ./relax_stub.go ./runtime.go ./runtime1.go ./runtime2.go ./rwmutex.go ./select.go ./sema.go ./signal_amd64.go ./signal_darwin.go ./signal_darwin_amd64.go ./signal_unix.go ./sigqueue.go ./sizeclasses.go ./slice.go ./softfloat64.go ./stack.go ./string.go ./stubs.go ./stubs_amd64.go ./stubs_nonlinux.go ./symtab.go ./sys_darwin.go ./sys_darwin_64.go ./sys_nonppc64x.go ./sys_x86.go ./time.go ./time_nofake.go ./timestub.go ./trace.go ./traceback.go ./type.go ./typekind.go ./utf8.go ./vdso_in_none.go ./write_err.go
  
/<..>/asm ... -o $WORK/b002/asm.o ./asm.s
/<..>/asm ... -o $WORK/b002/asm_amd64.o ./asm_amd64.s
/<..>/asm ... -o $WORK/b002/duff_amd64.o ./duff_amd64.s
/<..>/asm ... -o $WORK/b002/memclr_amd64.o ./memclr_amd64.s
/<..>/asm ... -o $WORK/b002/memmove_amd64.o ./memmove_amd64.s
/<..>/asm ... -o $WORK/b002/preempt_amd64.o ./preempt_amd64.s
/<..>/asm ... -o $WORK/b002/rt0_darwin_amd64.o ./rt0_darwin_amd64.s
/<..>/asm ... -o $WORK/b002/sys_darwin_amd64.o ./sys_darwin_amd64.s
  
/<..>/pack r $WORK/b002/_pkg_.a $WORK/b002/asm.o $WORK/b002/asm_amd64.o $WORK/b002/duff_amd64.o $WORK/b002/memclr_amd64.o $WORK/b002/memmove_amd64.o $WORK/b002/preempt_amd64.o $WORK/b002/rt0_darwin_amd64.o $WORK/b002/sys_darwin_amd64.o

You can see why the previous actions were needed by looking at b002.

b002 contains all the runtime packages needed to run Go's binaries. For example, b002 also contains a Go GC implementation called mgc.go. It is importing b004 ( internal / cpu) and b006 ( runtime / internal / atomic).

b002 may be the most complex package in the core library, but the build itself is the same process as before. In other words, the file output by asm and compile is packed to _pkg_.a.

The output of b002 is an archive file called $ WORK / b002 / _pkg_.a that corresponds to runtime.

b001

cat >$WORK/b001/importcfg << 'EOF'
# import config
packagefile runtime=$WORK/b002/_pkg_.a
EOF

cd /<..>/main

/<..>/compile ... -o $WORK/b001/_pkg_.a -p main ./main.go

cat >$WORK/b001/importcfg.link << 'EOF'
packagefile command-line-arguments=$WORK/b001/_pkg_.a
packagefile runtime=$WORK/b002/_pkg_.a
packagefile internal/bytealg=$WORK/b003/_pkg_.a
packagefile internal/cpu=$WORK/b004/_pkg_.a
packagefile runtime/internal/atomic=$WORK/b006/_pkg_.a
packagefile runtime/internal/math=$WORK/b007/_pkg_.a
packagefile runtime/internal/sys=$WORK/b008/_pkg_.a
EOF

/<..>/link 
  -o $WORK/b001/exe/a.out 
  -importcfg $WORK/b001/importcfg.link 
  -buildmode=exe 
  -buildid=yC-qrh2sY_qI0zh2-NE7/owNzOBTqPO00FkqK0_lF/HPXqvMz_4PvKsQzqGWgD/yC-qrh2sY_qI0zh2-NE7 
  -extld=clang 
  $WORK/b001/_pkg_.a

mv $WORK/b001/exe/a.out main

First it builds an importcfg that includes runtime built in b002 to then compile main.go to pkg.a

  1. First, create a importcfg that includes the runtime of b002, then compile main.go to create a _pkg_.a.
  2. Create a importcfg.link that contains command-line-arguments = $ WORK / b001 / _pkg_.a in addition to all the packages that appeared before, then link them with the link command to create an executable file Create a.
  3. Finally, rename it to main and move to the output destination.

Let's supplement the argument of link.

  1. -buildmode: Build the executable
  2. -extld: Refer to an external linker

I finally got what I was looking for.

The main binary is born from b001.

Similarities with Bazel

Creating action graphs for efficient caching is the same idea as the build tools Bazel uses for fast builds.

Golang's action id and content id correspond to the action cache andcontent-addressable store (CAS)that Bazel uses in the cache.

Bazel is a product of Google, and so is Golang. It would be very reasonable for them to have a similar philosophy on how to build software quickly and accurately.

In Bazel's rules_go package, you can see how to reimplement go build in the builder code.

This is a very clean implementation, as action graphs, folder management, and caching are handled externally by Bazel.

To the next step

go build did a lot to compile a program like this one that doesn't do anything.

I didn't go into too much detail about the tool (compile`` asm) and its input and output files (.a`` .o .s) this time around.

Also, this time I'm just compiling the most basic program.

You can make the compilation more complicated by doing the following:

  1. Import other packages For example, importing fmt to output Hello world will add 23 more actions to the action graph.
  2. Use go.mod to reference external packages
  3. Build for other architectures with different values for GOOS and GOARCH For example, compiling for wasm will have completely different actions and arguments.

Running go build and inspecting the logs is a top-down approach to learning how the Go compiler works. If you want to learn from the basics, it's a great starting point to dive into resources such as:

  1. Introduction to the Go compiler
  2. Go: Overview of the Compiler
  3. Go at Google: Language Design in the Service of Software Engineering
  4. build.go
  5. compile/main.go

References

Recommended Posts

What's happening when you "go build"?
[Go] Execution / build / package test
When pyenv install BUILD FAILED
Environment construction, Build -Go edition-