Time comparison: Correlation coefficient calculation in Python

Introduction

In atmospheric and ocean data analysis, a correlation coefficient map is often drawn. In Python, Numpy and Scipy have functions to calculate the correlation coefficient, so you can use them to draw a map of the correlation coefficient. However, there are multiple ways to do this, so here we will compare their performance.

Development environment

How to calculate the correlation coefficient

1. How to use Scipy functions to run loops

def crr_loop_scipy(x1, x2):
    ret = numpy.empty([x1.shape[1], x2.shape[1]])
    for jj in xrange(ret.shape[1]):
        for ii in xrange(ret.shape[0]):
            ret[ii,jj], _ = scipy.stats.pearsonr(x1[:,ii], x2[:,jj])
    return ret

2. How to run a loop using Numpy functions

def crr_loop_numpy(x1, x2):
    ret = numpy.empty([x1.shape[1], x2.shape[1]])
    for jj in xrange(ret.shape[1]):
        for ii in xrange(ret.shape[0]):
            ret[ii,jj] = numpy.corrcoef(x1[:,ii], x2[:,jj])[1,0]
    return ret

3. How to use Numpy functions but not loop

def crr_numpy(x1, x2):
    return numpy.corrcoef(x1, x2, rowvar=False)[0:x1.shape[1], x1.shape[1]:]

4. How to calculate the covariance matrix yourself

def crr_original(x1, x2):
    crscov = np.dot(x1.T, x2) / x1.shape[0]
    std1 = np.std(x1, axis=0).reshape(x1.shape[1], 1)
    std2 = np.std(x2, axis=0).reshape(x2.shape[1], 1) 
    crsstd = np.dot(std1, std2.T)
    return crscov / crsstd

Experimental settings

def generate_data(nx):
    nt = 1000
    data1 = numpy.random.randn(nt, nx)
    data2 = numpy.random.randn(nt, nx)
    return data1, data2

python


import time

def measure_time(f, *args):
    start = time.time()
    f(*args)
    return time.time() - start

def measure_mean_time(f, *args):
    n = 2
    return numpy.mean([measure_time(f, *args) for i in xrange(n)])

result

crr_ventimark.png

Result is,

Was overwhelmingly fast. Also, since scipy is faster than numpy, it may be better to use scipy instead of aiming at the correlation coefficient map like this time.

The entire source code used in the experiment

import numpy as np
from scipy import stats
import time


def crr_loop_scipy(x1, x2):
    ret = np.empty([x1.shape[1], x2.shape[1]])
    for jj in xrange(ret.shape[1]):
        for ii in xrange(ret.shape[0]):
            ret[ii,jj], _ = stats.pearsonr(x1[:,ii], x2[:,jj])
    return ret


def crr_loop_numpy(x1, x2):
    ret = np.empty([x1.shape[1], x2.shape[1]])
    for jj in xrange(ret.shape[1]):
        for ii in xrange(ret.shape[0]):
            ret[ii,jj] = np.corrcoef(x1[:,ii], x2[:,jj])[1,0]
    return ret


def crr_numpy(x1, x2):
    return np.corrcoef(x1, x2, rowvar=False)[0:x1.shape[1], x1.shape[1]:]


def crr_original(x1, x2):
    crscov = np.dot(x1.T, x2) / x1.shape[0]
    std1 = np.std(x1, axis=0).reshape(x1.shape[1], 1)
    std2 = np.std(x2, axis=0).reshape(x2.shape[1], 1) 
    crsstd = np.dot(std1, std2.T)
    return crscov / crsstd


def generate_data(nx):
    nt = 1000
    data1 = np.random.randn(nt, nx)
    data2 = np.random.randn(nt, nx)
    return data1, data2


def measure_time(f, *args):
    start = time.time()
    f(*args)
    return time.time() - start


def measure_mean_time(f, *args):
    n = 2
    return np.mean([measure_time(f, *args) for i in xrange(n)])


def experiment(nxs, crrs):
    mean_times = np.empty([len(crrs), len(nxs)])
    for icrr, crr in enumerate(crrs):
        for ix, nx in enumerate(nxs):
            x1, x2 = generate_data(nx)
            mean_times[icrr, ix] = measure_mean_time(crr, x1, x2)
    return mean_times


if __name__ == '__main__:
    import matplotlib.pyplot as plt
    
    nxs1 = range(200, 200 + 1000, 200)
    nxs2 = range(200, 200 + 10000, 600)
    mean_times1 = experiment(nxs1, [crr_loop_scipy, crr_loop_numpy])
    mean_times2 = experiment(nxs2, [crr_numpy, crr_original])
    
    lebels1 = ['loop_sci', 'loop_np']
    labels2 = ['np', 'original']
    for label1, label2, mean_time1, mean_time2 in zip(lebels1, labels2, mean_times1, mean_times2):
        plt.plot(nxs1, mean_time1, label=label1)
        plt.plot(nxs2, mean_time2, label=label2)

    plt.xlabel('nx')
    plt.ylabel('time (sec)')
    plt.title('nt=1000')
    plt.xlim([0, nxs2[-1]])
    plt.ylim([0, 80])
    plt.legend()
    plt.show()

Recommended Posts

Time comparison: Correlation coefficient calculation in Python
Calculation of standard deviation and correlation coefficient in Python
Date calculation in python
Date calculation in Python
Shapley value calculation in Python
Quantum chemistry calculation in Python (Psi4)
Accelerometer Alan Variance Calculation in Python
Measure function execution time in Python
[Python] Calculation of Kappa (k) coefficient
Code tests around time in Python
Calculation of Spearman's rank correlation coefficient
I compared the calculation time of the moving average written in Python
Just print Python elapsed time in seconds
Playing card class in Python (with comparison)
[Python] Calculation of image similarity (Dice coefficient)
MongoDB for the first time in Python
Let's make a combination calculation in Python
Get time series data from k-db.com in Python
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
3 ways to parse time strings in python [Note]
First Python 3 ~ First comparison ~
Meta-analysis in Python
Unittest in python
Calculation result after the decimal point in Python
Epoch in Python
Discord in Python
python time measurement
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
A clever way to time processing in Python
Survival time analysis learned in Python 2 -Kaplan-Meier estimator
First time python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Disassemble in Python
Reflection in Python
Constant in python
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python