Trial and error to improve cgo memory profiling by go beginners

This article is the 12th day article of freee Engineers Advent Calendar.

Hello. I'm @taiyo, a 17-year-old freee graduate and an engineer intern. I've only touched scripting languages, such as Ruby for freee, PHP for circles, and Python for my thesis. I got the opportunity to get involved in development using go in freee and started to touch it, and I started to like go about 3 months ago.

I'm still a new engineer for about 1 year and 3 months, but I was triggered by the events described later. As "** I want to make memory profiling on the C side easier with go-made applications that use cgo **" has increased.

I would like to write about these three points.

I have a memory leak in Go's API server

This is the beginning of the matter. After releasing a major change to the Go API server, a memory leak occurred a few days later. I woke up ...! !!

As mentioned earlier, I've never encountered the phenomenon of memory leaks by touching only scripting languages, and it was my first experience, so I was really impatient. I didn't know what to do, but I proceeded with advice while I was warm.

Found to be the cause of memory leaks in both Go and C

In conclusion, the cause was found on both the Go side and the C side.

First survey

The application contained Go's standard library net / http / pprof, so I used it to check the memory usage. During my research, I remember simply saying "Wow!" To the convenience of net / http / pprof and the greatness of go, which is implemented as a standard library.

After all, in the investigation here, ** CString of cgo that is not the target of GC of go, Data that I thought I could release It turned out that one of the causes was that it was not released **.

Add code to release ↓,

defer C.free(unsafe.Pointer(released))

Immediately fix and release! This should fix it!

I thought ... but there was still a memory leak.

Second survey

I searched using pprof and couldn't find a possible cause. Since this application uses cgo, I decided to examine the C code and separate the implementation part of ** from Go and put valgrind from ** As a result of investigation, I found a part that seems to be a leak bottleneck on the C side.

I fixed it and released it again, and then the leak did not occur, so the situation was settled for the time being.

I thought like this

I want to be able to do go research and C research together

In the first survey, net / http / pprof is amazing! !! I thought, but there was an unexpected pitfall. The profiling target of net / http / pprof is only the memory allocated by go, ** C doesn't seem to output **.

Also, although it is a major valgrind in C and C ++ memory profiler, it seems that it is not supported by Go and cgo so far (https://github.com/golang/go/issues/782), as mentioned above. I had to investigate ** C code separately from Go **.

This would require separate ** go and C investigations **, which would take time. Instead of doing that, you want to find a bottleneck in one shot.

It would be nice to be able to see the memory status on the C side with net / http / pprof

I also want to create an environment that makes it easy to investigate when the same thing happens. net / http / pprof After all it is convenient, so it would be nice to be able to output which line of C is grabbing memory with the same interface.

Since there is cgo, compatibility between go and C should be high, so I wonder if I can do something about it. So I decided to try various things.

policy

The following three are candidates.

  1. Align Go and C allocators
  2. Create a library that will profile C memory from Go with the same interface as net / http / pprof
  3. Do your best with valgrind (maintain the status quo)

As for 2, I thought it would take a lot of time and strength, so This time, I focused on investigating whether profiling can be done from pprof by "Aligning Go and C allocators" in 1.

Hypothesis of aligning go and C allocators

For the time being, I followed the source code from net / http / pprof. Eventually we arrived at runtime / malloc.go and

runtime/malloc.go


// Memory allocator.
//
// This was originally based on tcmalloc, but has diverged quite a bit.
// http://goog-perftools.sourceforge.net/doc/tcmalloc.html

// The main allocator works in runs of pages.
// Small allocation sizes (up to and including 32 kB) are
// rounded to one of about 70 size classes, each of which
// has its own free set of objects of exactly that size.
// Any free page of memory can be split into a set of objects
// of one size class, which are then managed using a free bitmap.
//
// The allocator's data structures are:
//
//	fixalloc: a free-list allocator for fixed-size off-heap objects,
//		used to manage storage used by the allocator.
//	mheap: the malloc heap, managed at page (8192-byte) granularity.
//	mspan: a run of pages managed by the mheap.
//	mcentral: collects all spans of a given size class.
//	mcache: a per-P cache of mspans with free space.
//	mstats: allocation statistics.

As you can see in this comment, ** go bases its memory allocation on tcmalloc, which is normal in C. I was using malloc. ** ** From this, if the allocators used for memory allocation are different, if they are the same, they will be picked up well. I put the hypothesis.

Actually, originally, "How about combining the different allocators used between C and Go?" I was advised. However, at that time, I honestly didn't understand it very well, and after reading the code so far, I felt like I was finally connected. For now, let's test the hypothesis that "try to match C's allocator with tcmalloc".

Replaced by gperftools

The replacement with malloc-> tcmalloc is most often found in searches, and in the hope that it will be compatible with go made by google, gperftools I tried it. (I was a little worried that there was no brand new article about gperftools, but the commit on the github repository was the latest in 2016/11.)

This time, I installed gperftools in the version of go application that caused the memory leak and checked the operation. The development environment is built on the container using docker.

Below is the additional code. (Simplified)

#Dockerfile

...
RUN apt-get -y install google-perftools libgoogle-perftools-dev
...

cgo_sample.go


package main

/*
...
#cgo CFLAGS: -I/usr/include
#cgo LDFLAGS: -L/usr/lib -ltcmalloc
#include "gperftools/tcmalloc.h"
*/
import "C"
import (
  "net/http"
  _ "net/http/pprof"
);

func main() {
  http.HandleFunc("/", handler)
  http.ListenAndServe(":8800", nil)
}

func handler(w http.ResponseWriter, r *http.Request) {
  //Processing that leaks in C, etc.
}

Install google-perftools and libgoogole-perftools-dev with apt-get. The two installed by apt-get are under / usr, so specify them in CFLAGS and LDFLAGS respectively.

inspection result

$ curl http://localhost:8800
$ go tool pprof http://localhost:8800/debug/pprof/heap?debug=1
(pprof) text
5836.79kB of 5836.79kB total (  100%)
Dropped 57 nodes (cum <= 29.18kB)
Showing top 10 nodes out of 23 (cum >= 512.69kB)
      flat  flat%   sum%        cum   cum%
 2430.03kB 41.63% 41.63%  2430.03kB 41.63%  go.SomeLeakMethod1
 1334.04kB 22.86% 64.49%  1334.04kB 22.86%  go.OtherLeakMethod1
 1048.02kB 17.96% 82.44%  1048.02kB 17.96%  go.SomeLeakMethod2
  512.02kB  8.77%   100%   512.02kB  8.77%  go.OtherLeakMethod2
         0     0%   100%  1048.02kB 17.96%  go.NLeakMethod
         0     0%   100%  3764.07kB 64.49%  go.NLeakMethod2

Even if the allocator was aligned with tcmallocc, only the go function that was the entrance to go was displayed, and the result was the same as the original pprof. Unfortunately, the hypothesis that "they will be output by pprof by aligning the allocators" is rejected.

Caution

In addition, although it is written in the original repository of gperftools, it seems that gperftools does not work well on Linux-64bit, so be careful there as well. The response was about 1.5 times slower than usual even though it was supposed to be a library for speeding up memory allocation. .. Even if it succeeds, it seems that it will cost a little to install.

Why was it no good?

The former was what I thought from the code reading of malloc.go. The latter may have come up by investigating processes and threads as well.

In any case, there was progress in improving cgo's profiling with just one possibility.

What I thought about trying to improve

Appropriate knowledge is required to improve low-layer areas

Through the memory leak response of my first experience, it became clear that not only go's language specifications but also computer science knowledge is widely required to deal with such situations. This time, especially for computers, I was able to input the CPU and memory, and for languages, I was able to input language specifications for using them, but at the same time, I enjoyed the convenient interface created by my predecessors as it was. I also understood that I was using it without thinking deeply. It makes me realize that there are still many things I don't know, and I feel that I'm tightened.

Need to fail in a good way

In the development culture of freee, there is an item "** Let's fail and attack **". I recognize this as "one failure is a source of growth for me, and I take aggressive action to prevent the same failure the next time."

For me, that's exactly what I think is involved in this memory leak incident.

  1. What can I do to prevent memory leaks from happening in the future?
  2. If you can create a situation where memory profiling can be done more smoothly, you should be able to proactively execute it and prevent it.
  3. Let's try

If you have a sense of challenge, you will feel more motivated and will do it. We will not forget this way of thinking and will strengthen our strength so that we can surely present improvement measures.

Summary

  1. ~~ Align Go and C allocators ~~
  2. Create a library that will profile C memory from Go with the same interface as net / http / pprof
  3. Do your best with valgrind (maintain the status quo)

I investigated 1 this time, but unfortunately I did not get good results. However, it was a great opportunity to read the source code and move it, even though I was not familiar with Go and C. The rest is 2 (1) plan, but I will spend a little time reading the source code of Go and work on 2. I would like to update the follow-up report later.

It ’s the end, freee Co., Ltd. is looking for long-term engineer intern, engineer 17, 18 graduates who wants to deliver serious value to users using new languages and technologies such as Go. I am. There is also a bento system where you can come to the office and eat bento. If you are interested in both students and working people, please feel free to contact us.

tomorrow, Three and a half years away from development. My favorite food is Komiya's char siu. This year, our CEO @ dice-sasaki @ github will be joining us on the Advnet Calendar! looking forward to!

Recommended Posts

Trial and error to improve cgo memory profiling by go beginners
Trial and error to speed up heat map generation
Trial and error to speed up Android screen captures