I was a little addicted to dealing with NaN in pandas, so make a note.
When dealing with mixed str and float types of data, use pd.isnull () instead of math.isnan () or np.isnan ().
First, read the data.
read_csv.py
import pandas as pd
import numpy as np
import math
data = pd.read_csv('test.csv', encoding='utf-8')
data looks like this.
| hoge | foo | |
|---|---|---|
| 0 | 0 | NaN |
| 1 | a | 1.0 |
| 2 | NaN | b |
I want to replace the NaN in column'hoge' with the string'No data'.
type.py
for i in range(len(data)):
print(type(data['hoge'][i]))
result
<class 'str'>
<class 'str'>
<class 'float'>
The result is as follows. Only NaN is of float type.
| hoge | foo | |
|---|---|---|
| 0 | str | float |
| 1 | str | str |
| 2 | float | str |
math.isnan()
math.isnan () cannot be used for str type.
math_isnan.py
for i in range(len(data)):
if math.isnan(data['hoge'][i]) == True:
data['hoge'][i] = 'No data'
result
TypeError: must be real number, not str
np.isnan()
np.isnan () also cannot be used for str type.
np_isnan.py
for i in range(len(data)):
if np.isnan(data['hoge'][i]) == True:
data['hoge'][i] = 'No data'
result
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
pd.isnull()
pd.isnull () becomes True for NaN.
pd_isnull.py
print(pd.isnull(data['hoge'][2]))
result
True
If you try to replace NaN with pd.isnull () for the column'hoge', which is a mixture of str type and float type, it will pass.
pd_isnull.py
for i in range(len(data)):
if pd.isnull(data['hoge'][i]) == True:
data['hoge'][i] = 'No data'
The NaN in column'hoge'has been replaced.
| hoge | foo | |
|---|---|---|
| 0 | 0 | NaN |
| 1 | a | 1.0 |
| 2 | No data | b |
Recommended Posts