When standardizing data using scikit-learn's StandardScaler, the following error may occur.
input
from sklearn.preprocessing import StandardScaler
#Training data (pandas.DataFrame type)
X = training_data()
#Standardization
sc = StandardScaler()
sc.fit(X)
output
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
To avoid this, it is necessary to remove NaN and infinity from the input data.
For example, the code below can remove a column containing at least one NaN from X.
#Remove columns containing NaN from X
X.drop(X.columns[np.isnan(X).any()], axis=1)
Description of each function
--np.isnan (X): Get True for NaN elements, False matrix for other elements --np.isnan (X) .any (): Get a list of True for columns containing NaN and False for other columns --X.columns [np.isnan (X) .any ()]: Get column names containing NaN --X.drop ('col', axis = 1): Remove a column with column name col from X
Recommended Posts