Abstract

I've experimented with several ways to coarse-grain numpy.ndarray. Coarse graining here refers to processing such as resize of opencv. However, since we deal with the coarse-graining described below, we do not deal with cases such as the enlargement of the array or the case where the coarse-grained side cannot divide one side of the original array.

Preface

When I was implementing a lattice gas automaton, it became necessary to grain grain at the stage of drawing the results. Specifically, I want to process the process of averaging 10000 * 10000 arrays every 10 * 10 to create 1000 * 1000 arrays. There is a resize in opencv, and there should be many people who want to use it, so when I googled that there would be nothing (subjective), a different resize came out. Then, it is a good story to use resize of opencv, but it is subtly troublesome that it only supports unsigned 1-byte integers, and if you insert an array of other types, an error will be thrown. Since it is only drawing the result, it is not so bad to use this, but I decided to implement it because I want to use an accurate value because it is a good idea.

Execution environment

Windows 7 Home Premium 64bit
Intel Core i5-2430M 2.4GHz
Python 3.6.1 (Anaconda)

Method tried

I tried the following four types.

--Prepare the array after coarse graining, take out the original array as a slice, and substitute the averaged value using the for statement. --Describe the above procedure in list comprehension --Slice the original array and add them together in rows and columns --Describe the above procedure using np.roll

In the following source, the original array will be explained as array (size 10000 * 10000), and the length of one side to be grained will be explained as mean_size (10 in the following explanation).

Prepare the array after coarse graining, take out the original array as a slice, and substitute the averaged value using the for statement.

Below is the source

`mean_loop`


def mean_loop(array, mean_size):
    meaned_size = np.shape(array)[0] // mean_size
    return_array = np.empty((meaned_size, meaned_size))

    for y in range(meaned_size):
        for x in range(meaned_size):
            return_array[y, x] = array[mean_size*y:mean_size*(y+1), mean_size*x:mean_size*(x+1)].mean()

    return return_array

It is a straightforward implementation of what you want to do. If it is implemented in C, it would be like this. However, it feels unpleasant when it comes to double loops in Python. Moreover, this time the array of 10000 * 10000 is changed to 1000 * 1000, so the execution time will be terrible.

Describe the above procedure in list comprehension

`mean_list_comprehension`


def mean_list_comprehenion(array, mean_size):
    meaned_size = np.shape(array)[0] // mean_size
    return np.array([[array[mean_size*y:mean_size*(y+1), mean_size*x:mean_size*(x+1)].mean() for x in range(meaned_size)] for y in range(meaned_size)])

Speaking of Python, list comprehension notation, and speaking of list comprehension notation, Python. It is said that it will be faster (subjective), so you can expect it. However, the loop is 1000 * 1000 here as well. However, as I wrote, the double list comprehension is crappy and hard to read.

Slice the original array and add them together in rows and columns

`mean_slice`


def mean_slice(array, mean_size):
    sum_row_array = np.array(array[::mean_size])

    for row in range(1, mean_size):
        sum_row_array += array[row::mean_size]

    sum_column_array = sum_row_array[:,::mean_size]

    for column in range(1, mean_size):
        sum_column_array += sum_row_array[:,column::mean_size]

    return sum_column_array / mean_size**2

A little muddy method. The loop is only 20 times to add 10 rows and 10 columns. It should be faster than the above two by any means.

Describe the above procedure using np.roll

`mean_np_roll`


def mean_np_roll(array, mean_size):
    sum_row_array = np.array(array)
    for row in range(1, mean_size):
        sum_row_array += np.roll(array, -row, axis=0)

    sum_column_array = np.array(sum_row_array)
    for column in range(1, mean_size):
        sum_column_array += np.roll(sum_row_array, -column, axis=1)

    return sum_column_array[::mean_size, ::mean_size] / mean_size ** 2

What they are doing is slightly different, but the idea is the same. I've heard that ndarray is slow to access elements, so I thought that slice might be slow, so I made it. However, while the slice method makes the area to be added smaller, here it remains the same size as the original array, so it seems to be slow in this respect.

result

I wanted to see it in various experimental settings, so I set the average size of one side of the original array as (3000, 10), (6000, 10), (9000, 10), (3000, 100), (6000, 100). I changed it to, (3000, 1000) and calculated it. The following is the execution result.

As expected, slice is faster than the above two, but the method using np.roll was slower than expected. Especially when the average area is large, the slowness is quite amazing. Also, the result is that the list comprehension is not so fast. Considering these results, I think it would be a bad idea to force the comprehension from the standpoint of readability. For the time being, the above function using slice seems to be good, so I think it is better to use this.

(Bonus?) Comparison with C

I implemented the top method in C. The extension is cpp, but the content is ordinary C. I hear that it is fast for some reason, so I use raw po. The details such as the part where the one-dimensional array is regarded as secondary are different, but when implementing it in C, it should be such a code, so I did not think about fairness to that extent. I did an x64 Release build in the Visual Studio 2017 Community.

`mean_loop.cpp`


#include "stdafx.h"
#include <stdio.h>
#include <time.h>
#include <Windows.h>

double* mean_loop(int mean_size, int array_size, double* array) {
	double *return_array;
	int meaned_size = array_size / mean_size;

	return_array = new double[meaned_size * meaned_size];


	for (size_t i = 0; i < meaned_size; i++)
	{
		for (size_t j = 0; j < meaned_size; j++)
		{
			double temp = 0;
			for (size_t k = 0; k < mean_size; k++)
			{
				for (size_t l = 0; l < mean_size; l++)
				{
					temp += array[(i + k) * meaned_size * mean_size + j*mean_size + l];
				}
			}
			return_array[i * meaned_size+ j] = temp / (mean_size * mean_size);
		}
	}

	return return_array;
};

int main()
{
	double *array, *return_array;
	int mean_size = 10, array_size = 3000;
	clock_t start, end, span;
	int array_sizes[] = {3000, 6000, 9000};
	int mean_sizes[] = { 10, 100, 1000 };

	for (size_t as = 0; as < 3; as++)
	{
		for (size_t ms = 0; ms < 3; ms++)
		{
			array_size = array_sizes[as];
			mean_size = mean_sizes[ms];

			array = new double[array_size * array_size];

			for (size_t i = 0; i < array_size * array_size; i++)
			{
				array[i] = i;
			}

			start = clock();

			return_array = mean_loop(mean_size, array_size, array);

			end = clock();

			span = end - start;
			printf("%d %4d took: %lf\n", array_size, mean_size, (double)span / CLOCKS_PER_SEC);

			delete return_array, array;

		}
		
	}
	
	system("pause");

	return 0;
}

The result of this calculation. loop_c++.PNG After all it is fast! I'm afraid of memory leaks because it's a raw po, but that's an effort (extreme theory), and since large-scale calculations are often performed in numerical calculations, C / C ++ should still be used. However, in exchange for that horror, I feel that this speed is in the right place when comparing the above relatively safe and reasonably fast code in Python.

Summary

Don't use scripting languages for numerical calculations.

[Python3] Coarse graining of numpy.ndarray Speed comparison etc.