Wrap C/C ++ with SWIG to speed up Python processing. [Overview]

What is SWIG?

SWIG is a tool for wrapping programs written in C/C ++ so that they can be used in multiple languages. Supported languages ​​include scripting languages ​​such as Javascript, Perl, PHP, Python, Tcl and Ruby, and non-scripting languages ​​such as C #, D, Go, Java, Lua, OCaml, Octave, Scilab and R.

Speed ​​comparison

It is well known that code written in C/C ++ is fast, but we will compare the actual speed difference under some conditions.

OS: Ubuntu18.04 CPU: Intel Corei7-7700k Memory: 24GB -O3 option available No parallelization

Hello World Execution speed of the function that displays "hello world!" 1000 times

python_time:2.356291e-03[sec]
swig_time__:1.398325e-03[sec]

There is a difference of about twice, but it does not seem to be as much as is generally said. I don't know exactly, but it's probably because of the overhead of string console display speed.

code

python


def hello_world():
    for i in range(1000):
        print("hello world!")

def test1():
    # python
    start = time.time()
    hello_world()
    python_time = time.time() - start
    # swig
    start = time.time()
    helloWorld()
    swig_time = time.time() - start
    print("python_time:{:e}".format(python_time) + "[sec]")
    print("swig_time__:{:e}".format(swig_time) + "[sec]")

c


void helloWorld()
{
    for (int i = 0; i < 1000; i++)
    {
        printf("hello world!\n");
    }
}

Count up numbers

Execution speed of a function that counts up to 1000 and returns

python_time:6.842613e-05[sec]
swig_time__:5.483627e-06[sec]

This time, the difference was more than 10 times. If you can make such a difference just by counting up the numbers, you can get a glimpse of why python is said to be slow.

code

python


def count_up():
    res = 0
    for i in range(1000):
        res += 1
    return res

def test2():
    # python
    start = time.time()
    res = count_up()
    print(res)
    python_time = time.time() - start
    # swig
    start = time.time()
    res = countUp()
    print(res)
    swig_time = time.time() - start
    print("python_time:{:e}".format(python_time) + "[sec]")
    print("swig_time__:{:e}".format(swig_time) + "[sec]")

c


int countUp()
{
    int res = 0;
    for (int i = 0; i < 1000; i++)
    {
        res += 1;
    }
    return res;
}

Image conversion

Execution speed of the function that converts a grayscale image to an RGB image

  • Use OpenCV module on Python side
python_time:1.032352e-04[sec]
swig_time__:1.156330e-04[sec]

This is almost the same result. OpenCV is written in C/C ++ in the first place, so it's a natural result. However, the execution speed of this SWIG includes the process of allocating the memory of the output destination with np.zeros (). It is not possible if you are using the OpenCV python package, but if you are writing in SWIG, you can reuse the output memory. The speed when the memory space is not secured is as follows.

python_time:1.101494e-04[sec]
swig_time__:7.319450e-05[sec]

Under this condition, SWIG is 30% faster. If you execute it multiple times, it seems that there is an advantage in solid writing with SWIG.

code

python


def test3():
    img_size = (256, 256)
    org_img = np.random.randint(0, 256, (img_size), dtype=np.uint8)
    # python
    start = time.time()
    res_py = cv2.cvtColor(org_img, cv2.COLOR_GRAY2RGB)
    python_time = time.time() - start
    # swig
    # res_swig = np.zeros((*img_size, 3), dtype=np.uint8)
    start = time.time()
    res_swig = np.zeros((*img_size, 3), dtype=np.uint8)
    imgGray2RGB(org_img, res_swig)
    swig_time = time.time() - start
    print("array_equal: {}".format(np.array_equal(res_py, res_swig)))
    print("python_time:{:e}".format(python_time) + "[sec]")
    print("swig_time__:{:e}".format(swig_time) + "[sec]")

c


void imgGray2RGB(unsigned char *inArr, int inDim1, int inDim2,
                 unsigned char *inplaceArr, int inplaceDim1, int inplaceDim2, int inplaceDim3)
{
    int height = inplaceDim1;
    int width = inplaceDim2;
    int channel = inplaceDim3;
    int h, w;
    int in_point, out_point;

    for (h = 0; h < height; h++)
    {
        for (w = 0; w < width; w++)
        {
            in_point = h * width + w;
            out_point = channel * (h * width + w);
            inplaceArr[out_point] = inArr[in_point];
            inplaceArr[out_point + 1] = inArr[in_point];
            inplaceArr[out_point + 2] = inArr[in_point];
        }
    }
}

(Optional) Image normalization

Execution speed with image normalization in addition to grayscale to RGB conversion

  • Use OpenCV module on Python side
python_time:1.460791e-03[sec]
swig_time__:3.521442e-04[sec]

OpenCV should also be implemented in C/C ++, but SWIG is four times faster than OpenCV. This is because all the processing is completed in one raster scan, and the amount of processing is greatly reduced. Of course, the Python package of OpenCV is divided into function units, so if you want to speed up by reducing such processing, processing in C/C ++ is indispensable.

code

python


def test4():
    img_size = (256, 256)
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
    mean_np = np.array(mean, dtype=np.float32)
    std_np = np.array(std, dtype=np.float32)
    org_img = np.random.randint(0, 256, (img_size), dtype=np.uint8)
    # python
    start = time.time()
    res_py = cv2.cvtColor(org_img, cv2.COLOR_GRAY2RGB).astype(np.float32)
    res_py = ((res_py / 255) - mean_np) / std_np
    python_time = time.time() - start
    # swig
    start = time.time()
    res_swig = np.zeros((*img_size, 3), dtype=np.float32)
    imgNormalize(org_img, res_swig, *mean, *std)
    swig_time = time.time() - start
    print("array_equal: {}".format(np.array_equal(res_py, res_swig)))
    print("python_time:{:e}".format(python_time) + "[sec]")
    print("swig_time__:{:e}".format(swig_time) + "[sec]")

c


void imgNormalize(unsigned char *inArr, int inDim1, int inDim2,
                  float *inplaceArr, int inplaceDim1, int inplaceDim2, int inplaceDim3,
                  float meanR, float meanG, float meanB,
                  float stdR, float stdG, float stdB)
{
    int height = inplaceDim1;
    int width = inplaceDim2;
    int channel = inplaceDim3;
    int h, w;
    int val;
    int inPoint, outPoint;

    for (h = 0; h < height; h++)
    {
        for (w = 0; w < width; w++)
        {
            inPoint = h * width + w;
            outPoint = channel * (w + width * h);
            val = inArr[inPoint];
            inplaceArr[outPoint] = ((float)val / 255 - meanR) / stdR;
            inplaceArr[outPoint + 1] = ((float)val / 255 - meanG) / stdG;
            inplaceArr[outPoint + 2] = ((float)val / 255 - meanB) / stdB;
        }
    }
}

next time

Next is the implementation edition. In the implementation section, we plan to introduce a little application such as basic SWIG usage, passing Numpy directly to an argument and referencing a pointer from the C/C ++ side.

Recommended Posts