[PYTHON] scipy.sparse is not optimized for dot product operations

import timeit
import numpy as np
import scipy as sp

def getSparse(length, size, todense = False):
    array = np.random.random_integers(0, size - 1, length)
    response =  scipy.sparse.csr_matrix(([1]*len(array), array,range(len(array) + 1)), shape=(len(array),size), dtype = array.dtype)
    return response.todense() if todense else response

def testDense():
	x = np.dot(np.random.rand(300000).reshape(300, 1000), getSparse(1000,300, True))

def testSparse():
	x = np.dot(np.random.rand(300000).reshape(300, 1000), getSparse(1000,300, False))

print(timeit.timeit(testDense, setup = 'import __main__', number = 1))
# 0.08102297782897949
print(timeit.timeit(testSparse, setup = 'import __main__', number = 1))
# 30.572995901107788

I expected the dot operation to be faster by using a sparse matrix, but it was horribly slow. Theoretically, it shouldn't be strange if it gets faster, but it seems that the implementation does not support it.

As an aside, what I really want to do is aggregate the values of a particular dimension of the ndarray. For example, processing that changes the data whose dimension is (store x date) to (region x date). In my use case, it was fine to calculate with the dense matrix normally, but when more data is added, this characteristic seems to be a problem.

Recommended Posts

scipy.sparse is not optimized for dot product operations
Windows Subsystem for Linux is not displayed