[PYTHON] How to deal with scikit-learn's "ValueError: Input contains NaN, infinity or a value too large for dtype ('float64')."

When standardizing data using scikit-learn's StandardScaler, the following error may occur.

`input`


from sklearn.preprocessing import StandardScaler

#Training data (pandas.DataFrame type)
X = training_data()

#Standardization
sc = StandardScaler()
sc.fit(X)

`output`


ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

To avoid this, it is necessary to remove NaN and infinity from the input data.

For example, the code below can remove a column containing at least one NaN from X.

#Remove columns containing NaN from X
X.drop(X.columns[np.isnan(X).any()], axis=1)

Description of each function

--np.isnan (X): Get True for NaN elements, False matrix for other elements --np.isnan (X) .any (): Get a list of True for columns containing NaN and False for other columns --X.columns [np.isnan (X) .any ()]: Get column names containing NaN --X.drop ('col', axis = 1): Remove a column with column name col from X