[PYTHON] Precautions when using for statements in pandas

Notes on speeding up Python code with Numba It's a continuation or a supplement.

Pandas is convenient for processing time series data, but if you use the for statement as it is, it will be extremely slow.

import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
                     names=('Time','Open','High','Low','Close', ''),
                     index_col='Time', parse_dates=True)

def LWMA(s, ma_period):
    y = pd.Series(0.0, index=s.index)
    for i in range(len(y)):
        for j in range(ma_period):
            y[i] += s[i-j]*(ma_period-j)
        y[i] /= ma_period*(ma_period+1)/2
    return y

%time MA = LWMA(dataM1['Close'], 10)
Wall time: 3min 10s

The number of data is as large as 370,000, but it is difficult to take 3 minutes on a moving average of 10 samples. For longer periods, it can take 10 minutes or more. (Core i7-6700 3.4GHz)

In Previous article, I replaced it with array and tried to speed it up using Numba, but in fact, it will be faster just by using array.

def LWMA1(s, ma_period):
    a = s.values
    y = np.zeros(len(a))
    for i in range(len(y)):
        for j in range(ma_period):
            y[i] += a[i-j]*(ma_period-j)
        y[i] /= ma_period*(ma_period+1)/2
    return pd.Series(y, index=s.index)

%timeit MA = LWMA1(dataM1['Close'], 10)
1 loop, best of 3: 1.92 s per loop

Even without using Numba, it's about 100 times faster than pandas.

Of course, in this state, using Numba will make it even faster.

from numba import jit
@jit
def LWMA2(s, ma_period):
    a = s.values
    y = np.zeros(len(a))
    for i in range(len(y)):
        for j in range(ma_period):
            y[i] += a[i-j]*(ma_period-j)
        y[i] /= ma_period*(ma_period+1)/2
    return pd.Series(y, index=s.index)

%timeit MA = LWMA2(dataM1['Close'], 10)
100 loops, best of 3: 5.31 ms per loop

Even if you use a for statement in pandas, it is wise to do so if it can be replaced by an array (if the index is irrelevant). It seems that a little trick is needed to get along well with pandas.

Recommended Posts

Precautions when using for statements in pandas
Precautions when using pit in Python
Precautions when using codecs and pandas
Precautions when using Chainer
Precautions when using tf.keras.layers.TimeDistributed for tf.keras custom layer
Precautions when changing unix time to datetime type in pandas
Precautions when using TextBlob trait analysis
Precautions when using the urllib.parse.quote function
[TouchDesigner] Tips for for statements using python
Precautions when using phantomjs from python
Precautions when using six with Python 2.5
When using regular expressions in Python
Things to watch out for when using default arguments in Python
Show labels for each element when drawing scatter plots in Pandas
Precautions when pickling a function in python
[python, multiprocessing] Behavior for exceptions when using multiprocessing
Notes for using python (pydev) in eclipse
A useful note when using Python for the first time in a while
Precautions when using OpenCV from Power Automate Desktop
Japanese text preprocessing without for statement in pandas
Functions that can be used in for statements
Tips for using Realsense SR300 on MacBook in 2020
Python pandas: Search for DataFrame using regular expressions
Precautions when dealing with control structures in Python 2.6
Precautions when adding items using DateField to an existing model later in Django
Precautions when installing packages in the conda environment
Things to keep in mind when using Python for those who use MATLAB
Precautions when using google-cloud library with GAE / py
Tips for using ElasticSearch in a good way
Precautions for cv2.cvtcolor
Error that occurred in OpenCV3 and its solution Precautions when using OpenCV3 on Mac
Directory structure for test-driven development using pytest in python
Precautions when using sqlite3 on macOS Sierra (10.12) with multiprocessing
Initial settings when using the foursquare API in python
Precautions when dealing with ROS MultiArray types in Python
How to access with cache when reading_json in pandas
How to exit when using Python in Terminal (Mac)
Where to fix code when using plotly for free
Command line collection for using virtual environment in Anaconda
Summary of things that were convenient when using pandas
When you want to plt.save in a for statement
[Memo] Text matching in pandas data frame using flashtext
[Introduction to Python] How to write repetitive statements using for statements
Python3> slice copy / slice notation> used in for statements, etc.
Precautions when operating with string for TmeStampType of PySpark
[For recording] Pandas memorandum
Summary when using Fabric
Learn Pandas in 10 minutes
Precautions when handling Luigi
Cross tabulation using Pandas
UnicodeDecodeError in pandas read_csv
Precautions when installing fbprophet
Convert from Pandas DataFrame to System.Data.DataTable using Python for .NET
A story that stumbled when using pip in a proxy environment
A memo when creating a directed graph using Graphviz in Python
Behavior in each language when coroutines are reused with for
Useful tricks related to list and for statements in Python
Use "% tensorflow_version 2.x" when using TPU with Tensorflow 2.1.0 in Colaboratory
Solve the Japanese problem when using the CSV module in Python.
Graph time series data in Python using pandas and matplotlib
Problems and solutions when asked for MySQL db in Python 3