[PYTHON] Missing value pandas

This time I will write an article about missing values.

Contents

・ What is a missing value? ・ How to check missing values ・ How to deal with missing values

What is a missing value?

A missing value means that the value of the data is not entered. For example, the table data does not contain any specific numbers such as blanks or NULL. If there are missing values, the data cannot be obtained well, so even if it is visualized in the graph, it will be biased.

How to check missing values

So what are some ways to find missing values? It's easy to find using python's pandas. Here, we will use csv data.

isnull function

In pandas, there is a function that makes it easy to find out where there is no data. It is " isnull () </ b>". This is a function that displays False and True in each column where there is no data. If there is no value, True is displayed, and if there is, False is displayed.

any function

Then, you don't have to check the data one by one. If you want to check if there are any missing values, add the function " any () </ b>" after isnull (). It will check if there are any missing values in the column. The output result is displayed as True, False for each column. It displays True if there are missing values, False if there are none.

sum function

It is used when you want to know the number of missing values. As with the any function, the number of nulls can be output by adding it after the isnull function.

value_counts function

A function that checks the number of numbers in a specified column. For example, you can get the output result that there are 10 numbers 0.

How to deal with missing values

Just checking for missing values doesn't make any sense. You have to substitute a concrete numerical value for it. This is called interpolation.

fillna function

Now let's interpolate the numbers to null in the table. The function " fllna </ b>" is used at this time. This function will change everything in the table where nulls are displayed to numbers. By specifying a numerical value as an argument, you can specify any value you like and save it. fillna (0) will interpolate all nulls to 0.

dropna function

If you want to delete a row with null instead of replacing it with a concrete number, use the function " dropna </ b>". Strictly speaking, if a column is null, the corresponding row is deleted. If you want to make a specific column, specify "subset =" [" column name "]" in the argument.