[PYTHON] Partial One-Hot encoding

I have the following DF and want to do One-Hot encoding only for the Country column.

Country     |    Age       
--------------------------
Germany     |    23
Spain       |    25
Germany     |    24
Italy       |    30 

Up to scikit-learn version 0.20, you only had to specify the index of the column you want to do One-Hot encoding in categorical_features. In other words, it looks like this.

from sklearn.preprocessing import OneHotEncoder

onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X)

From scikit-learn version 0.23, ColumnTransformer is used for such patterns that are processed differently for each column. Don't forget to specify remainder =" passthrough " to leave the columns that are not covered.

from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import OneHotEncoder

column_trans = ColumnTransformer(transformers=[('categorical', OneHotEncoder(), [0])],
                                 remainder="passthrough")
X = column_trans.fit_transform(X)

reference

Recommended Posts

Partial One-Hot encoding
"Usable" one-hot Encoding method for machine learning
Python encoding