[PYTHON] Speed up processing by using C ++ vector, unordered_map internally in Cython

Overview

Cython allows you to pre-compile and use Python into equivalent C / C ++ code. Almost all Python grammars are supported, and simply compiling existing Python code with Cython will improve execution speed by tens of percent.

One of the interesting things about Cython is that it makes it possible to use C and C ++ data types directly from Python (Cython) code.

This time, instead of Python's dict, I'll use C ++'s unordered_map, and instead of list, I'll use vector directly from Cython code.

Experiment

Using list and dict as arrays and vector and unordered_map as associative arrays, respectively, generate int key arrays 1 million times each, and assign and refer to them as associative arrays to compare performance. to watch.

The Cython code uses .pyx instead of .py, so the file name is cython_stl_experiment.pyx.

cython_cpp_experiment.pyx


# distutils: language=c++
# -*- coding: utf-8 -*-

import time
from libcpp.vector cimport vector
from unordered_map cimport unordered_map


cdef vector[int] cpp_keys = [k for k in range(1000000)]
keys = [k for k in range(1000000)]

cdef unordered_map[int, bint] cpp_dict
py_dict = {}


cpdef test_python_dict_set():
    for key in keys:
        py_dict[key] = True


cpdef test_python_dict_lookup():
    for key in keys:
        assert py_dict[key] == True


cpdef test_cpp_umap_set():
    cdef int key
    for key in cpp_keys:
        cpp_dict[key] = True


cpdef test_cpp_umap_lookup():
    for key in cpp_keys:
        assert cpp_dict[key] == True


def run():
    start = time.clock()
    test_python_dict_set()
    print 'test_python_dict_set:', time.clock() - start

    start = time.clock()
    test_python_dict_lookup()
    print 'test_python_dict_lookup:', time.clock() - start

    start = time.clock()
    test_cpp_umap_set()
    print 'test_cpp_umap_set:', time.clock() - start

    start = time.clock()
    test_cpp_umap_lookup()
    print 'test_cpp_umap_lookup:', time.clock() - start

Try running the above code.

In [1]: import cython_cpp_experiment
In [2]: cython_cpp_experiment.run()
test_python_dict_set: 0.116818
test_python_dict_lookup: 0.064986
test_cpp_umap_set: 0.094401
test_cpp_umap_lookup: 0.01363

Although the assignment time to the associative array changes by only a little less than 20%, the lookup process can be speeded up by about 5 times.

at the end

Cython has implemented a function that automatically converts string (byte) types and std :: string, list and std :: vector, etc. according to the context, and keeps a relatively Python-like description. It is very convenient because you can use C ++ data types directly as it is.

The above code also handles the vector with for… in… in a Python-like syntax (internally, a C ++ iterator is generated and processed).

reference: Using C++ in Cython / Standard library: http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html#standard-library

You may hear people say that you should write extensions in C or use ctypes, but Cython, which can speed up only some code such as repetition in Python, may have various uses. think.

Also, the code used this time can be found here. https://github.com/ikuyamada/cython-cpp-experiment

Recommended Posts

Speed up processing by using C ++ vector, unordered_map internally in Cython
Using Python mode in Processing
Stack processing speed comparison by language
When reading C ++ structs in Cython
Speed up C / C ++ compilation with ccache