[PYTHON] Is there NaN in the pandas DataFrame?

I searched for a bit and couldn't find it, so I managed to figure out how to put it out, so make a note.

The theme is "Is there NaN in the pandas DataFrame?" As a simple check to see if the data is being processed properly, I would like to ** find out if there is a NaN value in the data frame and where it is **. If you want to fill / delete NaN, you can use fillna () / dropna (), but what you want to do now is ** "Check if there is NaN and display the row (column). **

As an example, I want to extract only the 2nd-4th rows or 1-3rd columns of this data frame.

Data creation


df=pd.DataFrame(np.random.randn(5,5))
df.ix[2:, 1:3] = np.nan
df.columns=list('abcde')
df
#[Out]#           a         b         c         d         e
#[Out]# 0 -0.678873 -1.277486 -1.062232  0.097525 -2.386115
#[Out]# 1 -1.063709 -1.919997 -0.131733 -0.606348  0.101888
#[Out]# 2 -1.701473       NaN       NaN       NaN  0.201468
#[Out]# 3 -0.624932       NaN       NaN       NaN -0.654297
#[Out]# 4  0.345065       NaN       NaN       NaN -0.232199

Output NaN as bool value

Use isnull () / notnull () to see if there is NaN. Reference below

How to handle NaN by the pandas formula: pandas 0.19.1 documentation »Working with missing data

use isnull method

isnull()


df.isnull()
#[Out]#        a      b      c      d      e
#[Out]# 0  False  False  False  False  False
#[Out]# 1  False  False  False  False  False
#[Out]# 2  False   True   True   True  False
#[Out]# 3  False   True   True   True  False
#[Out]# 4  False   True   True   True  False

What is returned is a data frame that is the same size as df and contains a bool value. True only at NaN.

not null () is the reverse of True / False of the data frame returned by is null ()

This is a little different from what I want to do

Summarize if there is NaN in the row (column)

What I want to do ** "Check for NaN and display its rows (columns)" ** When decomposed

I wonder if it will be.

** There is more than one Honyalara ** Speaking of ** numpy's ʻany` method **

np.any()


df.isnull().any()
#[Out]# a    False
#[Out]# b     True
#[Out]# c     True
#[Out]# d     True
#[Out]# e    False
#[Out]# dtype: bool

df.isnull().any(axis=1)
#[Out]# 0    False
#[Out]# 1    False
#[Out]# 2     True
#[Out]# 3     True
#[Out]# 4     True
#[Out]# dtype: bool

df.isnull().any(axis=0)  # df.isnull().any()Same as
#[Out]# a    False
#[Out]# b     True
#[Out]# c     True
#[Out]# d     True
#[Out]# e    False
#[Out]# dtype: bool

Since the default scanning direction of ʻany () is row direction (axis = 0), df.isnull (). Any () is a conversion by True(isnull () in the column, that is,NaN). Returns True if more than one is included / Falseif not. If you set any (axis = 1), the scanning direction is changed and the column direction (axis = 1) is searched for whetherTrue (that is, NaN`) is included.

ʻAxis = can be omitted, so writing df.isnull (). Any (1) is the same as df.isnull (). Any (axis = 1) `.

Is there even one NaN in the matrix?

It's a little different from what I want to do, and to make it ** return True if there is NaN in one place **, overlap any.

Does it contain even one NaN?


df.isnull().any().any()  #Contains NaN
#[Out]# True
dff=pd.DataFrame(np.random.randn(5,5))  #Does not contain NaN
dff.isnull().any().any()
#[Out]# False

I did the same for stack overflow. stack overflow - Python pandas: check if any value is NaN in DataFrame Besides df.any (). any ()

I'm using it. The fastest time measured by % timeit wasdf.isnull (). Values.any (). ** If you want to know if even one NaN is included **, use it.

Extract rows (columns) containing NaN

I can finally do what I want to do. With df.isnull (). Any (1), you can create a bool value to see if the row contains NaN, slice it **, and extract only the columns containing NaN.

Line extraction including NaN


df[df.isnull().any(1)]
#[Out]#           a   b   c   d         e
#[Out]# 2 -1.701473 NaN NaN NaN  0.201468
#[Out]# 3 -0.624932 NaN NaN NaN -0.654297
#[Out]# 4  0.345065 NaN NaN NaN -0.232199

Row extraction including NaN


df.ix[:,df.isnull().any()]
#[Out]#           b         c         d
#[Out]# 0 -1.277486 -1.062232  0.097525
#[Out]# 1 -1.919997 -0.131733 -0.606348
#[Out]# 2       NaN       NaN       NaN
#[Out]# 3       NaN       NaN       NaN
#[Out]# 4       NaN       NaN       NaN

that's all!

There seems to be an easier way, but isn't it? Please let me know. Also, while the pandas row extraction has loc, ʻiloc, the column extraction has df. or df.ix [:, ] `. There is, but it's not beautiful, so is there any beautiful way (row loc, iloc and paired column loc, iloc-like) (\ * ω \ *)

Update 2017/4/15 Extract the third row with df.icol (3) Extract the 0th and 2nd columns with df.icol ([0,2]) In df.icol ([0: 2]), columns 0, 1 and 2 are ** not extracted and error **


I posted a speed comparison in the comment section.

Recommended Posts

Is there NaN in the pandas DataFrame?
[Pandas] If the first row data is in the header in DataFrame
Check if the expected column exists in Pandas DataFrame
Put the lists together in pandas to make a DataFrame
Is there a special in scipy? ??
There is no switch in python
Unfortunately there is no sense of unity in the where method
Browse .loc and .iloc at the same time in pandas DataFrame
[Python] What is pandas Series and DataFrame?
Get the top nth values in Pandas
What is "mahjong" in the Python library? ??
How to reassign index in pandas dataframe
[Pandas] Expand the character string to DataFrame
The date is displayed incorrectly in matplotlib.
[pandas] When specifying the default Index label in the at method, "" is not required
Pipfile is not created in the current directory
What is wheezy in the Docker Python image?
Delete rows with arbitrary values in pandas DataFrame
About the difference between "==" and "is" in python
Determining if there are birds in the image
[Python] Sort the table by sort_values (pandas DataFrame)
Remove rows with duplicate indexes in pandas DataFrame
Save Pandas DataFrame as .csv.gz in Amazon S3
There is Linux.
Isn't there a default value in the dictionary?
Check if the string is a number in python
This is a sample of function application in dataframe.
When the selected object in bpy.context.selected_objects is not returned
Linux is something like that in the first place
If there were no DI containers in the world.
VS Code says there is an error in cv2
Convert comma-separated numeric strings to numbers in Pandas DataFrame
What is the domain attribute written in Plotly's Layout?
Check if it is Unix in the scripting language
Fill outliers with NaN based on quartiles in Pandas
Find the part that is 575 from Wikipedia in Python
Determine if an attribute is defined in the object
Check if it is Unix in the scripting language
Learn Pandas in 10 minutes
Use DataFrame in Java
Use Mean in DataFrame
When reading an image with SimpleITK, there is a problem if there is Japanese in the path
UnicodeDecodeError in pandas read_csv
Python application: Pandas # 3: Dataframe
Test.py is not reflected on the web server in Python3.
[Python] Open the csv file in the folder specified by pandas
Check what line caused the error with apply () (dataframe, Pandas)
Calculate the time difference between two columns with Pandas DataFrame
I tried to summarize the code often used in Pandas
A handy function to add a column anywhere in a Pandas DataFrame
Automatically get the port where Arduino is stuck in Python
Find out how many each character is in the string.
Get the class name where the method is defined in the decorator
Can't input Japanese in Flatpak application? The cause is Fcitx.
The minimum methods to remember when aggregating data in Pandas
Is there a secret to the frequency of pi numbers?
When merging via pull request, there is no committer information in the response from Github API