Such a multi columns data frame
Convert to a single columns data frame.
--Repair python's multi columns to single columns. --multi columns are a bit inconvenient when merging. --The value aggregated by df.groupby ("hoge"). agg ("A": {"sum", "mean"]}) becomes multi columns, so I used it when combining. ――I felt like there was an existing function, but I couldn't find it when I searched for it ... Please let me know if you are an expert.
--Select and extract any row / column from pandas MultiIndex: https://note.nkmk.me/python-pandas-multiindex-indexing/ --Handling pandas multi-columns nicely tips: https://qiita.com/tenajima/items/55bb8b5843690d464225
python
#Function that converts the column names in the first and second rows of df into concatenated column names
def rename_multicol(df):
df_col=df.columns #Copy column name
df = df.T.reset_index(drop=False).T #Reset column name once
for i in range(df.shape[1]): #Newly defined column name
rename_col = {i:"".join(df_col[i])}
df = df.rename(columns = rename_col)
df = df.drop(["level_0","level_1"],axis=0)
return df
Here is an example of actually using this function.
python
company_list =["A","B","C","D","E","F","G"]
pref_list = ["Tokyo","Okinawa", "Tokyo", "Kanagawa", "Kanagawa", "Tokyo", "Okinawa"]
num_list = [10,3,6,4,5,7,2]
df = pd.DataFrame({"company name":company_list,
"Prefectures":pref_list,
"Number of people":num_list})
#Calculate total and average number of people
number_df = df.groupby("Prefectures",as_index=False).agg({"Number of people":["sum","mean"]})
#Convert to single column
number_df_rename = rename_multicol(number_df)
Suppose you want to aggregate the total and average number of people by prefecture from df. 'df.groupby ("prefecture", as_index = False) .agg ({" number of people": ["sum", "mean"]})' is used for counting at the same time.
Number_df after aggregation. The number of people and sum and mean are multi columns.
Number_df_rename converted to single columns using the rename_multicol function. Now you can combine with another dataframe with the state as the key.
In the first place, I searched for "cancel multiple columns" without knowing multicolumns ... It's convenient to say the table alone, but it was difficult to handle when considering the join.
Recommended Posts