[PYTHON] Convenient time series aggregation with TimeGrouper in pandas

As a reminder, I didn't have any information about TimeGrouper like the extension of GroupBy in pandas. Please let me know if there is a more de facto standard method!

Example: Useful when you want to aggregate within a specific period

Example

Related stackoverflow

Example 1: Aggregate every 6 months from data with monthly values (from [1])

ts = pd.date_range('7/1/2008', periods=30, freq='MS') 
df = pd.DataFrame(pd.Series(range(len(ts)), index=ts))
df[0] += 1
df # 2008/7/Create a monthly data frame starting from 1

Screen Shot 2017-01-04 at 23.18.06.png

df2 = pd.DataFrame([0], index = [df.index.shift(-1, freq='MS')[0]])
df2

Screen Shot 2017-01-04 at 23.21.24.png

df2 can be aggregated every 6 months

df2.append(df).groupby(pd.TimeGrouper(freq='6M')).aggregate(np.sum)[1:] 

Screen Shot 2017-01-04 at 23.21.32.png

Example 2: Aggregate daily from data that has a transaction record with a time stamp

rng = pd.date_range(start = '2014-01-01',periods = 100, freq='H')
df_original = pd.DataFrame({'Volume' : np.random.randint(100,2000,len(rng))}, index=rng)
df_original
Out[148]: 
	Volume
2014-01-01 00:00:00	1484
2014-01-01 01:00:00	1635
2014-01-01 02:00:00	984
2014-01-01 03:00:00	1239
2014-01-01 04:00:00	785
2014-01-01 05:00:00	871
2014-01-01 06:00:00	614
2014-01-01 07:00:00	119
2014-01-01 08:00:00	933
2014-01-01 09:00:00	624
...	...
2014-01-04 19:00:00	1832
2014-01-04 20:00:00	1996
2014-01-04 21:00:00	1040
2014-01-04 22:00:00	1867
2014-01-04 23:00:00	1098
2014-01-05 00:00:00	1397
2014-01-05 01:00:00	1996
2014-01-05 02:00:00	610
2014-01-05 03:00:00	1242
100 rows × 1 columns
df_tmp = pd.DataFrame({'Volume':[0]}, index = [df_original.index.shift(-1, freq='D')[0]])
df_daily=df_tmp.append(df_original).groupby(pd.TimeGrouper(freq='D')).aggregate(np.sum)[1:] 
df_daily

Screen Shot 2017-01-05 at 00.07.58.png

P.S I want to embed jupyter in Qiita

Recommended Posts

Convenient time series aggregation with TimeGrouper in pandas
Reformat the timeline of the pandas time series plot with matplotlib
Graph time series data in Python using pandas and matplotlib
Easy time series prediction with Prophet
Convenient analysis with Pandas + Jupyter notebook
Excel aggregation with Python pandas Part 1
<Pandas> How to handle time series data in a pivot table
Forecasting time series data with Simplex Projection
Load csv with duplicate columns in pandas
Predict time series data with neural network
Excel aggregation with Python pandas Part 2 Variadic
Plot CSV of time series data with unixtime value in Python (matplotlib)
Find a turning point! [Extracting change points in time series with change finder]
Adding Series to columns in python pandas
Working with 3D data structures in pandas
Get time series data from k-db.com in Python
Delete rows with arbitrary values in pandas DataFrame
View details of time series data with Remotte
How to read time series data in PyTorch
Remove rows with duplicate indexes in pandas DataFrame
Handle integer types with missing values in Pandas
Try converting videos in real time with OpenCV
Time Series Decomposition
pandas series part 1
Visualize Prophet's time series forecasts more clearly with Plotly
About processing IoT time series data-Aggregation processing in Azure Time Series Insights
Change the time zone with Docker in Oracle Database
Get standard output in real time with Python subprocess
How to access with cache when reading_json in pandas
Fill outliers with NaN based on quartiles in Pandas
I tried to implement time series prediction with GBDT
Save TOPIX time series in pickle, csv, Excel format
pickle To read what was made in 2 series with 3 series
[Time series with plotly] Dynamic visualization with plotly [python, stock price]
Convert numeric variables to categorical with thresholds in pandas