group by post-processing

DataFrame of pandas will have index MultiIndex if you do group by in multiple columns. I'm a little clogged up to process, so I'll write what I did as a memorandum.

environment

here,

Python == 3.8
pandas == 1.1.3

I'm running on.

Data preparation

For example, if you have the following data:

import datetime
import random
import pandas as pd

item_list = ['A', 'A', 'A', 'B', 'C','C', 'D']
data_records = []
ts = datetime.datetime.now()
for _ in range(1000):
    ts += datetime.timedelta(seconds=random.randint(200, 3600))
    data_records.append({
        'ts': ts,
        'wday': ts.weekday(),
        'item': random.choice(item_list),
        'qty': random.randint(1, 5)
    })
df = pd.DataFrame(data_records)

As df Screenshot from 2020-10-10 00-35-09.png You should get something like this.

here,

ts: timestamp
wday: day of the week
item: Product (ID)
qty: Quantity

Imagine something like a log of an EC site.

Thing you want to do

Now suppose you want to see how many items sell in total for each day of the week. Actually, it is normal to specify the period with ts, but aside from that, I think that you will do the following.

df.groupby(['wday', 'item']).qty.sum()

Then you will get something like this: Screenshot from 2020-10-10 00-40-42.png It's not bad, but it's also hard to see. Here, if you do ʻunstack`,

df.groupby(['wday', 'item']).qty.sum().unstack()

Screenshot from 2020-10-10 00-42-01.png have become.

reference

For more information, see Pandas Official Documentation.

[PYTHON] [Note] pandas unstack

group by post-processing

environment

Data preparation

Thing you want to do

reference