[PYTHON] Function to return multi columns to single column in DataFrame

Function function

Such a multi columns data frame WS000001.JPG

Convert to a single columns data frame. WS000002.JPG

Useful scene

--Repair python's multi columns to single columns. --multi columns are a bit inconvenient when merging. --The value aggregated by df.groupby ("hoge"). agg ("A": {"sum", "mean"]}) becomes multi columns, so I used it when combining. ――I felt like there was an existing function, but I couldn't find it when I searched for it ... Please let me know if you are an expert.

Multi-column reference

--Select and extract any row / column from pandas MultiIndex: https://note.nkmk.me/python-pandas-multiindex-indexing/ --Handling pandas multi-columns nicely tips: https://qiita.com/tenajima/items/55bb8b5843690d464225

Actual code

python


#Function that converts the column names in the first and second rows of df into concatenated column names
def rename_multicol(df):
    df_col=df.columns #Copy column name
    df = df.T.reset_index(drop=False).T #Reset column name once
    for  i in range(df.shape[1]): #Newly defined column name
        rename_col = {i:"".join(df_col[i])}
        df = df.rename(columns = rename_col)     
    df = df.drop(["level_0","level_1"],axis=0)
    return df

Here is an example of actually using this function.

python


company_list =["A","B","C","D","E","F","G"]
pref_list = ["Tokyo","Okinawa", "Tokyo", "Kanagawa", "Kanagawa", "Tokyo", "Okinawa"]
num_list = [10,3,6,4,5,7,2]

df = pd.DataFrame({"company name":company_list,
                   "Prefectures":pref_list,
                   "Number of people":num_list})

#Calculate total and average number of people
number_df = df.groupby("Prefectures",as_index=False).agg({"Number of people":["sum","mean"]})

#Convert to single column
number_df_rename = rename_multicol(number_df)

Suppose you want to aggregate the total and average number of people by prefecture from df. 'df.groupby ("prefecture", as_index = False) .agg ({" number of people": ["sum", "mean"]})' is used for counting at the same time. WS000000.JPG

Number_df after aggregation. The number of people and sum and mean are multi columns. WS000001.JPG

Number_df_rename converted to single columns using the rename_multicol function. Now you can combine with another dataframe with the state as the key. WS000002.JPG

At the end

In the first place, I searched for "cancel multiple columns" without knowing multicolumns ... It's convenient to say the table alone, but it was difficult to handle when considering the join.

Recommended Posts

Function to return multi columns to single column in DataFrame
A handy function to add a column anywhere in a Pandas DataFrame
To return char * in a callback function using ctypes in Python
Function to convert Excel column to number
How to get a specific column name and index name in pandas DataFrame
How to find a specific type (str, float etc) column in a DataFrame column
How to use calculated columns in CASTable
How to reassign index in pandas dataframe
Adding Series to columns in python pandas
To execute a Python enumerate function in JavaScript
Add totals to rows and columns in pandas
How to Mock a Public function in Pytest