I searched for a bit and couldn't find it, so I managed to figure out how to put it out, so make a note.

The theme is "Is there NaN in the pandas DataFrame?" As a simple check to see if the data is being processed properly, I would like to ** find out if there is a NaN value in the data frame and where it is **. If you want to fill / delete NaN, you can use fillna () / dropna (), but what you want to do now is ** "Check if there is NaN and display the row (column). **

As an example, I want to extract only the 2nd-4th rows or 1-3rd columns of this data frame.

`Data creation`


df=pd.DataFrame(np.random.randn(5,5))
df.ix[2:, 1:3] = np.nan
df.columns=list('abcde')
df
#[Out]#           a         b         c         d         e
#[Out]# 0 -0.678873 -1.277486 -1.062232  0.097525 -2.386115
#[Out]# 1 -1.063709 -1.919997 -0.131733 -0.606348  0.101888
#[Out]# 2 -1.701473       NaN       NaN       NaN  0.201468
#[Out]# 3 -0.624932       NaN       NaN       NaN -0.654297
#[Out]# 4  0.345065       NaN       NaN       NaN -0.232199

Output NaN as bool value

Use isnull () / notnull () to see if there is NaN. Reference below

How to handle NaN by the pandas formula: pandas 0.19.1 documentation »Working with missing data

use isnull method

`isnull()`


df.isnull()
#[Out]#        a      b      c      d      e
#[Out]# 0  False  False  False  False  False
#[Out]# 1  False  False  False  False  False
#[Out]# 2  False   True   True   True  False
#[Out]# 3  False   True   True   True  False
#[Out]# 4  False   True   True   True  False

What is returned is a data frame that is the same size as df and contains a bool value. True only at NaN.

not null () is the reverse of True / False of the data frame returned by is null ()

This is a little different from what I want to do

Summarize if there is NaN in the row (column)

What I want to do ** "Check for NaN and display its rows (columns)" ** When decomposed

Examine rows (columns) with one or more NaNs
Extract the row (column) with slice / loc / ix / ...

I wonder if it will be.

** There is more than one Honyalara ** Speaking of ** numpy's ʻany` method **

`np.any()`


df.isnull().any()
#[Out]# a    False
#[Out]# b     True
#[Out]# c     True
#[Out]# d     True
#[Out]# e    False
#[Out]# dtype: bool

df.isnull().any(axis=1)
#[Out]# 0    False
#[Out]# 1    False
#[Out]# 2     True
#[Out]# 3     True
#[Out]# 4     True
#[Out]# dtype: bool

df.isnull().any(axis=0)  # df.isnull().any()Same as
#[Out]# a    False
#[Out]# b     True
#[Out]# c     True
#[Out]# d     True
#[Out]# e    False
#[Out]# dtype: bool

Since the default scanning direction of ʻany () is row direction (axis = 0), df.isnull (). Any () is a conversion by True(isnull () in the column, that is,NaN). Returns True if more than one is included / Falseif not. If you set any (axis = 1), the scanning direction is changed and the column direction (axis = 1) is searched for whetherTrue (that is, NaN`) is included.

ʻAxis = can be omitted, so writing df.isnull (). Any (1) is the same as df.isnull (). Any (axis = 1) `.

Is there even one NaN in the matrix?

It's a little different from what I want to do, and to make it ** return True if there is NaN in one place **, overlap any.

`Does it contain even one NaN?`


df.isnull().any().any()  #Contains NaN
#[Out]# True
dff=pd.DataFrame(np.random.randn(5,5))  #Does not contain NaN
dff.isnull().any().any()
#[Out]# False

I did the same for stack overflow. stack overflow - Python pandas: check if any value is NaN in DataFrame Besides df.any (). any ()

df.isnull().values.sum()
df.isnull().sum().sum()
df.isnull().values.any()

I'm using it. The fastest time measured by % timeit wasdf.isnull (). Values.any (). ** If you want to know if even one NaN is included **, use it.

Extract rows (columns) containing NaN

I can finally do what I want to do. With df.isnull (). Any (1), you can create a bool value to see if the row contains NaN, slice it **, and extract only the columns containing NaN.

`Line extraction including NaN`


df[df.isnull().any(1)]
#[Out]#           a   b   c   d         e
#[Out]# 2 -1.701473 NaN NaN NaN  0.201468
#[Out]# 3 -0.624932 NaN NaN NaN -0.654297
#[Out]# 4  0.345065 NaN NaN NaN -0.232199

`Row extraction including NaN`


df.ix[:,df.isnull().any()]
#[Out]#           b         c         d
#[Out]# 0 -1.277486 -1.062232  0.097525
#[Out]# 1 -1.919997 -0.131733 -0.606348
#[Out]# 2       NaN       NaN       NaN
#[Out]# 3       NaN       NaN       NaN
#[Out]# 4       NaN       NaN       NaN

that's all!

There seems to be an easier way, but isn't it? Please let me know. Also, while the pandas row extraction has loc, ʻiloc, the column extraction has df. or df.ix [:, ] `. There is, but it's not beautiful, so is there any beautiful way (row loc, iloc and paired column loc, iloc-like) (\ * ω \ *)

Update 2017/4/15 Extract the third row with df.icol (3) Extract the 0th and 2nd columns with df.icol ([0,2]) In df.icol ([0: 2]), columns 0, 1 and 2 are ** not extracted and error **

I posted a speed comparison in the comment section.

[PYTHON] Is there NaN in the pandas DataFrame?

Data creation

Output NaN as bool value

isnull()

Summarize if there is NaN in the row (column)

np.any()