[PYTHON] How to deal with scikit-learn's "ValueError: Input contains NaN, infinity or a value too large for dtype ('float64')."

When standardizing data using scikit-learn's StandardScaler, the following error may occur.

input


from sklearn.preprocessing import StandardScaler

#Training data (pandas.DataFrame type)
X = training_data()

#Standardization
sc = StandardScaler()
sc.fit(X)

output


ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

To avoid this, it is necessary to remove NaN and infinity from the input data.

For example, the code below can remove a column containing at least one NaN from X.

#Remove columns containing NaN from X
X.drop(X.columns[np.isnan(X).any()], axis=1)

Description of each function

--np.isnan (X): Get True for NaN elements, False matrix for other elements --np.isnan (X) .any (): Get a list of True for columns containing NaN and False for other columns --X.columns [np.isnan (X) .any ()]: Get column names containing NaN --X.drop ('col', axis = 1): Remove a column with column name col from X

Recommended Posts

How to deal with scikit-learn's "ValueError: Input contains NaN, infinity or a value too large for dtype ('float64')."
What to do if you get the error Input contains NaN, infinity or a value too large for dtype ('float64'). In machine learning
How to extract non-missing value nan data with pandas
How to extract non-missing value nan data with pandas
How to substitute a numerical value for a partial match (Note 1)
A story about how to deal with the CORS problem
How to create a label (mask) for segmentation with labelme (semantic segmentation mask)
[Python] How to get a value with a key other than value with Enum