[PYTHON] Pandas: groupby () to complete value by group

Pandas study notes.

http://pandas.pydata.org/pandas-docs/stable/groupby.html As I read here, it was difficult to understand the example of value completion of group by, so I will write a simple example.

Preparation.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: key = list('ABCABCABC')

In [4]: value = [1,2,3,np.nan,np.nan,np.nan,4,4,4]

In [5]: df = pd.DataFrame({'key': key, 'value': value})

In [6]: df
Out[6]: 
  key  value
0   A    1.0
1   B    2.0
2   C    3.0
3   A    NaN
4   B    NaN
5   C    NaN
6   A    4.0
7   B    4.0
8   C    4.0

Ffill () on a group-by-group basis

If you ffill () without grouping, all three NaNs will be complemented with value 3.0 with ʻindex` of 2.

In [7]: df.ffill()
Out[7]: 
  key  value
0   A    1.0
1   B    2.0
2   C    3.0
3   A    3.0
4   B    3.0
5   C    3.0
6   A    4.0
7   B    4.0
8   C    4.0

If you group by key and then ffill (), NaN will be complemented by the value immediately before NaN for each group. Therefore, when ʻindex is 0, 1, 2 ( keyis A, B, C respectively),value 1.0, 2.0, 3.0 and ʻindex 3, 4, 5 ( key is A, B, respectively) C) value is complemented.

In [8]: df.groupby('key').ffill()
Out[8]: 
  key  value
0   A    1.0
1   B    2.0
2   C    3.0
3   A    1.0
4   B    2.0
5   C    3.0
6   A    4.0
7   B    4.0
8   C    4.0

Take the average for each group and fill it

Where value is NaN, take the average value for each group and fill it.

In [9]: f = lambda x: x.fillna(x.mean())

In [10]: transformed = df.groupby('key').transform(f)

In [11]: transformed
Out[11]: 
   value
0    1.0
1    2.0
2    3.0
3    2.5
4    3.0
5    3.5
6    4.0
7    4.0
8    4.0

If you take the average for each group before and after filling, you get the same value (GroupBy.mean () [excludes NaN from calculation](http://pandas.pydata. org / pandas-docs / stable / generated / pandas.core.groupby.GroupBy.mean.html # pandas.core.groupby.GroupBy.mean)).

In [12]: df.groupby('key').mean()
Out[12]: 
     value
key       
A      2.5
B      3.0
C      3.5

In [13]: transformed.groupby(key).mean()
Out[13]: 
   value
A    2.5
B    3.0
C    3.5

Recommended Posts

Pandas: groupby () to complete value by group
Standardize by group with pandas
How to fix multi-columns generated by Pandas groupby processing to single
Manipulating strings with pandas group by
Feature generation with pandas group by
Speed comparison when shifting by group by pandas
Draw a graph by processing with Pandas groupby
[pandas] GroupBy Tips
Missing value pandas
Sort by pandas
How to extract non-missing value nan data with pandas
How to extract non-missing value nan data with pandas
How to use Pandas 2
Convert 202003 to 2020-03 with pandas
Passing by value, passing by reference, passing by reference,
[Introduction to Pandas] I tried to increase exchange data by data interpolation ♬