I will explain how to get all the elements of the group that satisfy the conditions from the group by group in Python.
For example, if you have the following data and the highest score is 80 or more, you aim to acquire all the data of that person.
import pandas as pd
import numpy as np
df = pd.DataFrame({"name":["Yamada","Yamada","Yamada","Suzuki","Suzuki","Hayashi"],
"score":[60,70,80,60,70,80]})
print(df)
# name score
# 0 Yamada 60
# 1 Yamada 70
# 2 Yamada 80
# 3 Suzuki 60
# 4 Suzuki 70
# 5 Hayashi 80
(Corrected on 19/12/05) In such a case, you can write in one line by using `` `groupby.filter```.
new_df = df.groupby('name').filter(lambda group: group['score'].max() >= 80)
print(new_df)
# name score
# 0 Yamada 60
# 1 Yamada 70
# 2 Yamada 80
# 5 Hayashi 80
The content of `filter ()`
is a lambda expression for the condition.
By the way, before I was taught by Qiita, I used to extract conditions as follows. You can get a key that meets the conditions for each group that has groupby, and then join the original data frame to that key on the left. Specifically, the code is as follows.
group_df = df.groupby('name').max().reset_index()
key = group_df[group_df['score'] >= 80]['name']
new_df = pd.merge(key, df, on = 'name', how = 'left')
print(new_df)
# name score
# 0 Hayashi 80
# 1 Yamada 60
# 2 Yamada 70
# 3 Yamada 80
I was impressed to be able to write a series of flow of left outer join in one line in order to retrieve the key that satisfies the condition and restore the score information that was deleted by the groupby operation.
Recommended Posts