[Python3] Coarse graining of numpy.ndarray Speed comparison etc.

Abstract

I've experimented with several ways to coarse-grain numpy.ndarray. Coarse graining here refers to processing such as resize of opencv. However, since we deal with the coarse-graining described below, we do not deal with cases such as the enlargement of the array or the case where the coarse-grained side cannot divide one side of the original array.

Preface

When I was implementing a lattice gas automaton, it became necessary to grain grain at the stage of drawing the results. Specifically, I want to process the process of averaging 10000 * 10000 arrays every 10 * 10 to create 1000 * 1000 arrays. There is a resize in opencv, and there should be many people who want to use it, so when I googled that there would be nothing (subjective), a different resize came out. Then, it is a good story to use resize of opencv, but it is subtly troublesome that it only supports unsigned 1-byte integers, and if you insert an array of other types, an error will be thrown. Since it is only drawing the result, it is not so bad to use this, but I decided to implement it because I want to use an accurate value because it is a good idea.

Execution environment

Method tried

I tried the following four types.

--Prepare the array after coarse graining, take out the original array as a slice, and substitute the averaged value using the for statement. --Describe the above procedure in list comprehension --Slice the original array and add them together in rows and columns --Describe the above procedure using np.roll

In the following source, the original array will be explained as array (size 10000 * 10000), and the length of one side to be grained will be explained as mean_size (10 in the following explanation).

Prepare the array after coarse graining, take out the original array as a slice, and substitute the averaged value using the for statement.

Below is the source

mean_loop


def mean_loop(array, mean_size):
    meaned_size = np.shape(array)[0] // mean_size
    return_array = np.empty((meaned_size, meaned_size))

    for y in range(meaned_size):
        for x in range(meaned_size):
            return_array[y, x] = array[mean_size*y:mean_size*(y+1), mean_size*x:mean_size*(x+1)].mean()

    return return_array

It is a straightforward implementation of what you want to do. If it is implemented in C, it would be like this. However, it feels unpleasant when it comes to double loops in Python. Moreover, this time the array of 10000 * 10000 is changed to 1000 * 1000, so the execution time will be terrible.

Describe the above procedure in list comprehension

mean_list_comprehension


def mean_list_comprehenion(array, mean_size):
    meaned_size = np.shape(array)[0] // mean_size
    return np.array([[array[mean_size*y:mean_size*(y+1), mean_size*x:mean_size*(x+1)].mean() for x in range(meaned_size)] for y in range(meaned_size)])

Speaking of Python, list comprehension notation, and speaking of list comprehension notation, Python. It is said that it will be faster (subjective), so you can expect it. However, the loop is 1000 * 1000 here as well. However, as I wrote, the double list comprehension is crappy and hard to read.

Slice the original array and add them together in rows and columns

mean_slice


def mean_slice(array, mean_size):
    sum_row_array = np.array(array[::mean_size])

    for row in range(1, mean_size):
        sum_row_array += array[row::mean_size]

    sum_column_array = sum_row_array[:,::mean_size]

    for column in range(1, mean_size):
        sum_column_array += sum_row_array[:,column::mean_size]

    return sum_column_array / mean_size**2

A little muddy method. The loop is only 20 times to add 10 rows and 10 columns. It should be faster than the above two by any means.

Describe the above procedure using np.roll

mean_np_roll


def mean_np_roll(array, mean_size):
    sum_row_array = np.array(array)
    for row in range(1, mean_size):
        sum_row_array += np.roll(array, -row, axis=0)

    sum_column_array = np.array(sum_row_array)
    for column in range(1, mean_size):
        sum_column_array += np.roll(sum_row_array, -column, axis=1)

    return sum_column_array[::mean_size, ::mean_size] / mean_size ** 2

What they are doing is slightly different, but the idea is the same. I've heard that ndarray is slow to access elements, so I thought that slice might be slow, so I made it. However, while the slice method makes the area to be added smaller, here it remains the same size as the original array, so it seems to be slow in this respect.

result

I wanted to see it in various experimental settings, so I set the average size of one side of the original array as (3000, 10), (6000, 10), (9000, 10), (3000, 100), (6000, 100). I changed it to, (3000, 1000) and calculated it. The following is the execution result.

image.png

As expected, slice is faster than the above two, but the method using np.roll was slower than expected. Especially when the average area is large, the slowness is quite amazing. Also, the result is that the list comprehension is not so fast. Considering these results, I think it would be a bad idea to force the comprehension from the standpoint of readability. For the time being, the above function using slice seems to be good, so I think it is better to use this.

(Bonus?) Comparison with C

I implemented the top method in C. The extension is cpp, but the content is ordinary C. I hear that it is fast for some reason, so I use raw po. The details such as the part where the one-dimensional array is regarded as secondary are different, but when implementing it in C, it should be such a code, so I did not think about fairness to that extent. I did an x64 Release build in the Visual Studio 2017 Community.

mean_loop.cpp


#include "stdafx.h"
#include <stdio.h>
#include <time.h>
#include <Windows.h>

double* mean_loop(int mean_size, int array_size, double* array) {
	double *return_array;
	int meaned_size = array_size / mean_size;

	return_array = new double[meaned_size * meaned_size];


	for (size_t i = 0; i < meaned_size; i++)
	{
		for (size_t j = 0; j < meaned_size; j++)
		{
			double temp = 0;
			for (size_t k = 0; k < mean_size; k++)
			{
				for (size_t l = 0; l < mean_size; l++)
				{
					temp += array[(i + k) * meaned_size * mean_size + j*mean_size + l];
				}
			}
			return_array[i * meaned_size+ j] = temp / (mean_size * mean_size);
		}
	}

	return return_array;
};

int main()
{
	double *array, *return_array;
	int mean_size = 10, array_size = 3000;
	clock_t start, end, span;
	int array_sizes[] = {3000, 6000, 9000};
	int mean_sizes[] = { 10, 100, 1000 };

	for (size_t as = 0; as < 3; as++)
	{
		for (size_t ms = 0; ms < 3; ms++)
		{
			array_size = array_sizes[as];
			mean_size = mean_sizes[ms];

			array = new double[array_size * array_size];

			for (size_t i = 0; i < array_size * array_size; i++)
			{
				array[i] = i;
			}

			start = clock();

			return_array = mean_loop(mean_size, array_size, array);

			end = clock();

			span = end - start;
			printf("%d %4d took: %lf\n", array_size, mean_size, (double)span / CLOCKS_PER_SEC);

			delete return_array, array;

		}
		
	}
	
	system("pause");

	return 0;
}

The result of this calculation. loop_c++.PNG After all it is fast! I'm afraid of memory leaks because it's a raw po, but that's an effort (extreme theory), and since large-scale calculations are often performed in numerical calculations, C / C ++ should still be used. However, in exchange for that horror, I feel that this speed is in the right place when comparing the above relatively safe and reasonably fast code in Python.

Summary

Don't use scripting languages for numerical calculations.

Recommended Posts

[Python3] Coarse graining of numpy.ndarray Speed comparison etc.
Speed comparison of Python XML parsing
Python, Java, C ++ speed comparison
Comparison of 4 Python web frameworks
Samples of Python getters, setters, etc.
Speed comparison of Wiktionary full text processing with F # and Python
Comparison of calculation speed by implementation of python mpmath (arbitrary precision calculation) (Note)
(Java, JavaScript, Python) Comparison of string processing
Comparison of Japanese conversion module in Python3
python string comparison / use'list'and'in' instead of'==' and'or'
Speed comparison of word-separation in Python / janome, sudachi, ginza, mecab, fugashi, tinysegmenter
Try speed comparison of BigQuery Storage API
Comparison of Python serverless frameworks-Zappa vs Chalice
Comparison of matrix transpose speeds with Python
Speed comparison of murmurhash3, md5 and sha1
Python speed comparison regex vs startswith vs str [: word_length]
Unattended operation of Google Spreadsheets (etc.) in Python
Compare the speed of Python append and map
Thorough comparison of three Python morphological analysis libraries
Simple comparison of Python libraries that operate Excel
Speed: Add element to end of Python array
Comparison of R and Python writing (Euclidean algorithm)
Speed evaluation of CSV file output in Python
Comparison of Python and Ruby (Environment / Grammar / Literal)
What to use for Python stacks and queues (speed comparison of each data structure)
Introduction of Python
First Python 3 ~ First comparison ~
Basics of Python ①
Basics of python ①
Copy of python
Introduction of Python
Speed comparison of each language by Monte Carlo method
Python netCDF4 read speed and nesting of for statements
File write speed comparison experiment between python 2.7.9 and pypy 2.5.0
Python implementation comparison of multi-index moving averages (DEMA, TEMA)
A quick comparison of Python and node.js test libraries
Comparison table of frequently used processes of Python and Clojure
Comparison of CoffeeScript with JavaScript, Python and Ruby grammar