[PYTHON] Handle integer types with missing values in Pandas

Previously, the Pandas Series couldn't handle integer types with missing values.

pd.Series([1, 2, None], dtype=int)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

When reading numeric data including missing values without specifying the type, it is cast to float64 type.

pd.Series([1, 2, None])
0    1.0
1    2.0
2    NaN
dtype: float64

This behavior comes from the reason "because numpy.nan is a float type value ", but we want to handle missing values and don't have to be numpy.nan separately.

In response, Pandas v0.24.0 added Nullable integer data type. It seems that this problem was addressed by introducing a new pandas.NA instead of numpy.nan.

pd.Series([1, 2, None], dtype=pd.Int64Dtype())
0       1
1       2
2    <NA>
dtype: Int64

The value specified for dtype works the same with the string " Int64 " instead of pd.Int64Dtype (). (Note that ʻI` is uppercase.)

Also in the documentation

IntegerArray is currently experimental.

As it is written, this function is still in the experimental stage, so you need to be careful when using it.

Recommended Posts

Handle integer types with missing values in Pandas
Fill in missing values with Scikit-learn impute
Delete rows with arbitrary values in pandas DataFrame
Ingenuity to handle data with Pandas in a memory-saving manner
Handle various date formats with pandas
Get the top nth values in Pandas
Search / Delete Missing Values in "Kaggle Memorandum"
How to handle consecutive values in MySQL
Replace column names / values with pandas dataframe
Working with 3D data structures in pandas
Handle zip files with Japanese filenames in Python 3
A story packed with absolute values in numpy.ndarray
Remove rows with duplicate indexes in pandas DataFrame
Aggregate VIP values of Smash Bros. with Pandas
Missing value pandas
Find the sum of unique values with pandas crosstab
[Go] Handle queries containing IN clauses with NamedStmt with sqlx
Precautions when dealing with ROS MultiArray types in Python
How to access with cache when reading_json in pandas
How to extract null values and non-null values with pandas
Convert numeric variables to categorical with thresholds in pandas