[PYTHON] How to fix multi-columns generated by Pandas groupby processing to single

Overview

When performing Pandas groupby processing, using .agg () to calculate multiple statistics such as [max, min], the returned data frame is multi-column. I will introduce how to easily convert this multi-column to a single column.

Preparation

Create a 5-by-2 data frame consisting of only 0s and 1s as a sample.

input}


import numpy as np
import pandas as pd

mat = np.random.rand(5, 2)
mat[mat > 0.5] = 1
mat[mat <= 0.5] = 0
df = pd.DataFrame(mat, columns=['A', 'B'])

output}


     A    B
0  0.0  1.0
1  1.0  0.0
2  0.0  1.0
3  0.0  1.0
4  0.0  0.0

Status

If you specify [min, max] with .agg (), it will be multi-column.

input}


df.groupby('A').agg({'B': [min, max]}).columns

output}


MultiIndex([('B', 'min'),
            ('B', 'max')],
           )

solution

Prepare variables ( level1, level2 in the following example) as when handling zip in the for statement, and combine them as a character string using fstring.

input}


[f'{level1}__{level2}' for level1, level2 in df.groupby('A').agg({'B': [min, max]}).columns]

output}


['B__min', 'B__max']

Recommended Posts

How to fix multi-columns generated by Pandas groupby processing to single
Pandas: groupby () to complete value by group
Draw a graph by processing with Pandas groupby
How to use Pandas 2
Compare how to write processing for lists by language
How to use Pandas Rolling
How to override a user-defined method generated by python swig
How to write soberly in pandas
[Python] How to use Pandas Series
How to separate pipeline processing code into files by spider in Scrapy
[Python] Summary of how to use pandas
[Pandas] What is set_option [How to use]
How to reassign index in pandas dataframe
How to read CSV files in Pandas
How to use pandas Timestamp and date_range