[PYTHON] Create dummy variables in pandas (get_dummies)

Create dummy variables in pandas

In this article pandas 0.18.I am using 1.

If someone using R tries to do the same thing with Python (scikit-learn), there may be cases where it is difficult to handle categorical variables. Categorical data cannot be handled as it is by sklearn (when numpy.ndarray is used as input), so convert it to a dummy variable.

The data is as follows. It is assumed that sex has 1 for men, 2 for women, and age has values 1 to 3 corresponding to each age group.

df1
	id	sex	age
0	1001	1	3
1	1002	2	2
2	1003	1	3
3	1004	2	1
4	1005	2	1
df1 = df1.reset_index(drop=True)    #It will be merged by index later, so initialize it just in case.

dummy_df = pd.get_dummies(df1[['sex', 'age']], drop_first = True)   
print dummy_df
	sex_2	age_2	age_3
0	0.0	0.0	1.0
1	1.0	1.0	0.0
2	0.0	0.0	1.0
3	1.0	0.0	0.0
4	1.0	0.0	0.0

It's nicely made into a dummy variable. After setting a dummy variable for each variable, drop_first removes the first variable. (If you leave it, the variables will become dependent and it is inconvenient, so we are taking measures to exclude it here) Please note that drop_first is compatible with pandas 0.18.0 or later.


df2 = pd.merge(df1, dummy_df, left_index=True, right_index=True)
print df2
    id sex age  sex_2  age_2  age_3
0  1001   1   3    0.0    0.0    1.0
1  1002   2   2    1.0    1.0    0.0
2  1003   1   3    0.0    0.0    1.0
3  1004   2   1    1.0    0.0    0.0
4  1005   2   1    1.0    0.0    0.0

After merging, you can see that it is properly created as a dummy variable.

Recommended Posts

Create dummy variables in pandas (get_dummies)
Create a protein sequence mutation library in pandas
Learn Pandas in 10 minutes
Create SpatiaLite in Python
UnicodeDecodeError in pandas read_csv
Convert numeric variables to categorical with thresholds in pandas
How to create dataframes and mess with elements in pandas
Create a function in Python
Create a dictionary in Python
Create gif video in Python
Handle environment variables in Python
HTTP environment variables in Flask
Create a dummy data file
Swap columns in pandas dataframes