[PYTHON] [Note] pandas unstack

TL;DR

ʻUnstack` was useful for making the multiIndex series easier to see.

group by post-processing

DataFrame of pandas will have index MultiIndex if you do group by in multiple columns. I'm a little clogged up to process, so I'll write what I did as a memorandum.

environment

here,

I'm running on.

Data preparation

For example, if you have the following data:

import datetime
import random
import pandas as pd

item_list = ['A', 'A', 'A', 'B', 'C','C', 'D']
data_records = []
ts = datetime.datetime.now()
for _ in range(1000):
    ts += datetime.timedelta(seconds=random.randint(200, 3600))
    data_records.append({
        'ts': ts,
        'wday': ts.weekday(),
        'item': random.choice(item_list),
        'qty': random.randint(1, 5)
    })
df = pd.DataFrame(data_records)

As df Screenshot from 2020-10-10 00-35-09.png You should get something like this.

here,

Imagine something like a log of an EC site.

Thing you want to do

Now suppose you want to see how many items sell in total for each day of the week. Actually, it is normal to specify the period with ts, but aside from that, I think that you will do the following.

df.groupby(['wday', 'item']).qty.sum()

Then you will get something like this: Screenshot from 2020-10-10 00-40-42.png It's not bad, but it's also hard to see. Here, if you do ʻunstack`,

df.groupby(['wday', 'item']).qty.sum().unstack()

Screenshot from 2020-10-10 00-42-01.png have become.

reference

For more information, see Pandas Official Documentation.

Recommended Posts

[Note] pandas unstack
[Tips] My Pandas Note
Note
Pandas
Note
Django note 4
Pandas memo
pyenv note
Pandas basics
Pandas notes
Note: Python
Pandas memorandum
Pandas basics
Python note
Django Note 1
pandas memorandum
Django note 3
pandas SettingWithCopyWarning
[Note] RepresenterError
Note that the Pandas loc specifications have changed
(Note) Basic statistics on Python & Pandas on IBM DSX