[PYTHON] Standardize by group with pandas

Introduction

I was processing data for machine learning with pandas, I wanted to standardize by group of some columns rather than standardize as a whole. However, there was a scene where standardization was not necessary for the group name, but the standardization process was desired while retaining the group name. It's just a memo.

Execution environment

pandas = 0.25.3 numpy = 1.18.0

Code to standardize by group in pandas

Standardize columns for each class name in a table like the one below

class a b c
a 1.0 2.0 3.0
a 4.0 5.0 6.0
b 7.0 8.0 9.0
b 10.0 11.0 12.0

import pandas as pd
import numpy as np

# make data set
df = pd.DataFrame(np.arange(12).reshape(4, 3),
                  columns=['col_0', 'col_1', 'col_2'],
                  index=['row_0', 'row_1', 'row_2','row_3'])
df["class"] = ["a", "a", "b", "b"]

# Standardization for each group
class_ = df[["class"]]
class_names = df.groupby("class").groups.keys()
for name in class_names:
     df_tmp = df[(df['class'] == name)].drop(columns=['class'])
     df[(df['class'] == name)] =  (df_tmp - df_tmp.mean()) /df_tmp.std()
df["class"] = class_

First post. .. It's just a memo. Please let me know if there is a better way.

Recommended Posts

Standardize by group with pandas
Manipulating strings with pandas group by
Feature generation with pandas group by
Create an age group with pandas
Pandas: groupby () to complete value by group
Speed comparison when shifting by group by pandas
Sort by pandas
When to_csv with Pandas, it became line by line
Draw a graph by processing with Pandas groupby
Quickly visualize with Pandas
Processing datasets with pandas (1)
Bootstrap sampling with Pandas
Convert 202003 to 2020-03 with pandas
Processing datasets with pandas (2)
Merge datasets with pandas
Learn Pandas with Cheminformatics
Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
Extract N samples for each group with Pandas DataFrame
pandas Matplotlib Summary by usage
Read csv with python pandas
Load nested json with pandas
[Python] Change dtype with pandas
Visualization memo by pandas, seaborn
Prevent omissions with pandas print
Data processing tips with Pandas
Extract the maximum value with pandas.
Standardize non-normal distribution with robust Z-score
Versatile data plotting with pandas + matplotlib
[Python] Join two tables with pandas
Extract specific multiple columns with pandas
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Convenient analysis with Pandas + Jupyter notebook
Draw a graph with pandas + XlsxWriter
Hello World! By QPython with Braincrash
Bulk Insert Pandas DataFrame with psycopg2
I want to do ○○ with Pandas
Object recognition with openCV by traincascade
Excel aggregation with Python pandas Part 1
[Python] Format when to_csv with pandas
Handle various date formats with pandas