When you need a profile

The performance of the created program is not as good as I expected! I don't know where the bottleneck is, so I want to find out!

In such a case, use a profiling tool.

Install google-perftool

I will drop the source code from the link to the download page at the bottom of http://goog-perftools.sourceforge.net/. The latest version as of October 11, 2014 is gperftools-2.2.1.tar.gz. After unzipping, install with ./configure, make, make install as usual.

installation of graphviz

Not necessary for using google-perftool only on the command line, google-perftool creates a visually easy-to-understand diagram called a call graph. Since this figure is generated in dot format, it is a good idea to also install the tool "graphviz" for converting this dot format file to a format such as eps.

Download the source code from the Download page at http://www.graphviz.org/. Also unzip and ./configure, make, make install If you try to install from source code, you are likely to complain in ./configure as there are many dependent libraries. If you just want to convert the dot file generated by google-perftool to eps, most of them are not functionally necessary, so ignore them and make and make install.

How to use

Add the lib directory where google-perftool is installed to LD_LIBRARY_PATH. Write it in ./bash_profile etc.

export LD_LIBRARY_PATH=/home/tanaka/lib:$LD_LIBRARY_PATH

Link libprofiler.so when compiling the program you want to profile.

$ g++ -o hoge.exe hoge.cpp -g -lprofiler

Execute the program by specifying the analysis file name of the output destination.

$ export CPUPROFILE=prof.out; ./hoge.exe 
PROFILE: interrupts/evictions/bytes = xxx/x/xxxx

This will generate prof.out, so specify the original program and analysis file and display the result. Take a look at the top of the function that is taking a long time to execute.

$ pprof hoge.exe prof.out
Using local file prof.out.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 355 samples
     286  80.6%  80.6%      286  80.6% __write_nocancel
      16   4.5%  85.1%       16   4.5% __read_nocancel
      14   3.9%  89.0%       17   4.8% __lseek_nocancel
・
・
・

The second column from the left is the percentage of execution time that the function occupies. Since __write_nocancel is a function that is finally called by write (2), this program knows that write (2) is the bottleneck.

Creating a call graph

Create a call graph that shows the ratio of execution time by the size of the object and visually displays it in the format like a flow chart in the order of function calls as follows.

$ pprof --dot hoge.exe prof.out > prof.dot
$ dot -T eps prof.dot > prof.eps

Other profiling tools

Some of the most commonly used are:

perf (depending on Distribution, standard on Linux kernel 2.6.31 or later)
gperf (only programs compiled with gcc can be profiled)
oprofile

If you can use yum or apt-get (with that permission), it may be easy to install and use perf or oprofile. (Because when I tried to install perf from the source code into my home directory, I messed up with configure or make and gave up.)

If google-perftool is Linux kernel 2.6.31 or later, I feel that it is easy to use in that it is less likely to trip when installing from source.