[PYTHON] Judgment of NaN in pandas: When str type and float type are mixed

I was a little addicted to dealing with NaN in pandas, so make a note. When dealing with mixed str and float types of data, use pd.isnull () instead of math.isnan () or np.isnan ().

First, read the data.

read_csv.py


import pandas as pd
import numpy as np
import math

data = pd.read_csv('test.csv', encoding='utf-8')

data looks like this.

hoge foo
0 0 NaN
1 a 1.0
2 NaN b

I want to replace the NaN in column'hoge' with the string'No data'.

Examine the data type

type.py


for i in range(len(data)):
    print(type(data['hoge'][i]))

result


<class 'str'>
<class 'str'>
<class 'float'>

The result is as follows. Only NaN is of float type.

hoge foo
0 str float
1 str str
2 float str

math.isnan() math.isnan () cannot be used for str type.

math_isnan.py


for i in range(len(data)):
    if math.isnan(data['hoge'][i]) == True:
        data['hoge'][i] = 'No data'

result


TypeError: must be real number, not str

np.isnan() np.isnan () also cannot be used for str type.

np_isnan.py


for i in range(len(data)):
    if np.isnan(data['hoge'][i]) == True:
        data['hoge'][i] = 'No data'

result


TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

pd.isnull() pd.isnull () becomes True for NaN.

pd_isnull.py


print(pd.isnull(data['hoge'][2]))

result


True

If you try to replace NaN with pd.isnull () for the column'hoge', which is a mixture of str type and float type, it will pass.

pd_isnull.py


for i in range(len(data)):
    if pd.isnull(data['hoge'][i]) == True:
        data['hoge'][i] = 'No data'

The NaN in column'hoge'has been replaced.

hoge foo
0 0 NaN
1 a 1.0
2 No data b

Recommended Posts

Judgment of NaN in pandas: When str type and float type are mixed
Header shifts in read_csv () and read_table () of Pandas
"Type Error: Unrecognized value type: <class'str'>" in to_datetime of pandas
Behavior when multiple servers are specified in nameservers of dnspython
Precautions when changing unix time to datetime type in pandas