[PYTHON] Behavior of pandas rolling () method

Introduction

[Advanced Python (Time Series Analysis)](https://www.amazon.co.jp/%E6%99%82%E7%B3%BB%E5%88%97%E8%A7%A3%E6%9E % 90-% E8% 87% AA% E5% B7% B1% E5% 9B% 9E% E5% B8% B0% E5% 9E% 8B% E3% 83% A2% E3% 83% 87% E3% 83% AB% E3% 83% BB% E7% 8A% B6% E6% 85% 8B% E7% A9% BA% E9% 96% 93% E3% 83% A2% E3% 83% 87% E3% 83% AB% E3% 83% BB% E7% 95% B0% E5% B8% B8% E6% A4% 9C% E7% 9F% A5-Advanced-% E5% B3% B6% E7% 94% B0-% E7% 9B% B4% E5% B8% 8C / dp / 4320125010 / ref = pd_aw_sbs_14_4 / 355-0401449-3667847? _Encoding = UTF8 & pd_rd_i = 4320125010 & pd_rd_r = 1b383d15-c4b7-4c76-a51b-4defd44ba9e4 & pd_w I read 0dece788c310 & pf_rd_r = KNHHHK22TY4MB1ZQ5181 & psc = 1 & refRID = KNHHHK22TY4MB1ZQ5181). The rolling () method is used when taking the moving average. I have some questions, so this is a simple verification memo.

Question 1: Is it included in the specified window size?

-> Included.


First, create a Series.
series = pd.Series(range(10))
print(series)

# 0    0
# 1    1
# 2    2
# 3    3
# 4    4
# 5    5
# 6    6
# 7    7
# 8    8
# 9    9

Try it with window = 3. You can see the average of (top 2 lines + own line).
series_size3 = series.rolling(window=3).mean()
print(series_size3)

# 0    NaN
# 1    NaN
# 2    1.0
# 3    2.0
# 4    3.0
# 5    4.0
# 6    5.0
# 7    6.0
# 8    7.0
# 9    8.0

Question 2: Are you included even if center = True?

-> Included.


If you specify `center = True` as an argument, you can see that it is averaging with the previous and next lines.
series_size3 = series.rolling(window=3, center=True).mean()
print(series_size3)

# 0    NaN
# 1    1.0
# 2    2.0
# 3    3.0
# 4    4.0
# 5    5.0
# 6    6.0
# 7    7.0
# 8    8.0
# 9    NaN

Question 3: What happens when center = True and the window size is even?

-> The top line is taken first.


Validated with window = 2 and window = 4. It is the average when window = 2 (1 row above + own row).
series_size2 = series.rolling(window=2, center=True).mean()
print(series_size2)

# 0    NaN
# 1    0.5
# 2    1.5
# 3    2.5
# 4    3.5
# 5    4.5
# 6    5.5
# 7    6.5
# 8    7.5
# 9    8.5

If window = 4, the average of (top 2 rows + own row + bottom row).
series_size4 = series.rolling(window=4, center=True).mean()
print(series_size4)

# 0    NaN
# 1    NaN
# 2    1.5
# 3    2.5
# 4    3.5
# 5    4.5
# 6    5.5
# 7    6.5
# 8    7.5
# 9    NaN

Question 4: If the index number is shifted by shift (), is the calculated line also shifted?

-> No deviation


Set the first value to 100 for clarity.
series = pd.Series([100, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(series)

# 0    100
# 1      1
# 2      2
# 3      3
# 4      4
# 5      5
# 6      6
# 7      7
# 8      8
# 9      9

Compare the one that is not misaligned with the one that is misaligned by two lines. After taking the average, you can see that the index is off.
series_size4 = series.rolling(window=4).mean()
series_size4_shift = series.rolling(window=4).mean().shift(-2)
print(series_size4)
print(series_size4_shift)

# 0     NaN
# 1     NaN
# 2     NaN
# 3    26.5
# 4     2.5
# 5     3.5
# 6     4.5
# 7     5.5
# 8     6.5
# 9     7.5

# 0     NaN
# 1    26.5
# 2     2.5
# 3     3.5
# 4     4.5
# 5     5.5
# 6     6.5
# 7     7.5
# 8     NaN
# 9     NaN

Recommended Posts

Behavior of pandas rolling () method
Clustering of clustering method
Behavior of multiprocessing.pool.Pool.map
parallelization of class method
[python] behavior of argmax
About MultiIndex of pandas
Basic operation of Pandas
pandas resample and rolling
Summary of test method
[Python] Summary of table creation method using DataFrame (pandas)
Reuse the behavior of the @property method by using a descriptor [16/100]
Formatted display of pandas DataFrame
How to use Pandas Rolling
Basic usage of Pandas Summary
Index of certain pandas usage
The Power of Pandas: Python
Data visualization method using matplotlib (+ pandas) (5)
Etosetra related to read_csv of Pandas
Einsum implementation of value iterative method
Introduction of data-driven controller design method
[Memo] Small story of pandas, numpy
Data visualization method using matplotlib (+ pandas) (3)
Summary of go json conversion behavior
Behavior of python3 by Sakura's server
About the behavior of yield_per of SqlAlchemy
[python] -1 meaning of numpy's reshape method
Exact behavior of diff --ignore-matching-lines = RE
Data visualization method using matplotlib (+ pandas) (4)
[Python] How to read a csv file (read_csv method of pandas module)