[PYTHON] Search / Delete Missing Values in "Kaggle Memorandum"

Purpose

A memo on how to delete a column with a missing value.

Environment / usage data

Data used: Kaggle cources: Rent data for Intermediate Machine Learning --Missing Values

Environment: Kaggle notebook

What i did

Preparation of module / os, reading of data

DropColumn.py


#os,Module import
import os
import pandas as pd

#Data reading
X_full=pd.read_csv('../input/train.csv',index_col='Id')

X_full has the following indexes

DropColumn.py


X_full.columns
スクリーンショット 2020-02-18 9.57.36.png

Among them, the Column containing the defect is

DropColumn.py


cols_missing=[col for col in X_full.columns
            if X_full[col].isnull().any()]
cols_missing
スクリーンショット 2020-02-18 9.58.59.png

It seems. Delete these all at once.

DropColumn.py


reduced_X_full=X_full.drop(cols_missing,axis=1)
reduced_X_full
スクリーンショット 2020-02-18 10.00.27.png

Deletion completed.

2. Completion of missing values

How to use scikit-learn's SimpleImputer

SimpleImputer uses statistical values such as median and mean to complement missing values.

For example, if you want to complement with the median Specify as ʻimputer = SimpleImputer (strategy ='median')`.

ImputeValue.py


#Definition of imputer
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(strategy='median')

#X_Complement full missing values
imputed_X_full=pd.DataFrame(imputer.fit_transform(X_full))

At this rate, the column names of ʻimputed_X_full` are ordinal.

ImputeValue.py


imputer_X_full.columns
スクリーンショット 2020-02-18 11.44.55.png Undo column name

ImputeValue.py


imputed_X_full.columns=X_full.columns
imputed_X_full.columns
スクリーンショット 2020-02-18 11.45.42.png

Completion completed.

Recommended Posts

Search / Delete Missing Values in "Kaggle Memorandum"
Fill in missing values with Scikit-learn impute
Delete rows with arbitrary values in pandas DataFrame
Handle integer types with missing values in Pandas
[Data science memorandum] Handling of missing values ​​[python]
How to check for missing values (Kaggle: House Prices)
Binary search in Python
Linear search in Python
Swapping values in Python
Binary search in Python (binary search)
Pre-processing in machine learning 3 Missing values, outliers, and imbalanced data