What is NaN? NaN Zoya (Python) (394 days late)

update1 2020-01-25: typo fix ʻIEEE745-> ʻIEEE 754

In [1]: from datetime import datetime  
In [2]: (datetime(2020, 1, 11) - datetime(2018, 12, 13)).days                           
Out[2]: 394

I will explain how to handle nan in Python. In the following, the notation of nan as a concept is referred to as NaN.

Disclaimer: This post is for justInCase Advent Calendar 2018 and was posted after a period of about 400 days, but due to the maturity period, the content Is not fulfilling.

wrap up

--NaN in Python follows IEEE 754 NaN, but there are some addictive points. --Note the existence of Decimal ('nan'), pd.NaT,numpy.datetime64 ('NaT'), which are not float nan. --The nan object and math.nan that can be called from numpy and pandas modules are the same. You can use any of them. (But it is better to unify from the viewpoint of readability) --Note that pandas' ʻisna (...) method returns True as a missing value not only for nan but also for None, pd.NaT, etc. --The missing value of pandas is that pd.NAwill be introduced from pandas 1.0.0. It is desirable to usepd.NA` instead of nan as the missing value in the future. I will write about this in another article.

[Verification environment is described at the end](#Verification environment)

Handling of NaN in IEEE754

Please refer to the previous article What is NaN? NaN Zoya (R).

Note that quiet NaN propagates in general numerical operations, but what do you think the following two values should return? In fact, the handling of NaN at min and max changes between IEEE 754-2008 and IEEE 754-2019. The explanation of is in another article.

min(1.0, float('nan'))
max(1.0, float('nan'))

How to call NaN in Python

There are no language literals. If you are calling float ('nan') or numpy, which does not require a module call, np.nan tends to be used a lot.

import math
import decimal
import numpy as np
import pandas as pd

float('nan')
math.nan
0.0 * math.inf
math.inf / math.inf
# 0.0/0.0 ZeroDivisionError in Python. C, R,Many languages, such as julia, return NaN
np.nan
np.core.numeric.NaN
pd.np.nan

All float objects. Objects that are not singleton objects but are referenced by numpy`` pandas are the same.

nans = [float('nan'), math.nan, 0 * math.inf, math.inf / math.inf, np.nan, np.core.numeric.NaN, pd.np.nan]

import pprint
pprint.pprint([(type(n), id(n)) for n in nans])
# [(<class 'float'>, 4544450768),
#  (<class 'float'>, 4321186672),
#  (<class 'float'>, 4544450704),
#  (<class 'float'>, 4544450832),
#  (<class 'float'>, 4320345936),
#  (<class 'float'>, 4320345936),
#  (<class 'float'>, 4320345936)]

float ('nan') itself is an immutable object of float class, so hashable. So it can be a dictionary key, but strangely it allows you to add multiple nans. And if you don't bind the key to a variable in advance, you can't retrieve it again. This is thought to be because all the results of the comparison operator of NaN are False, that is, float ('nan') == float ('nan')-> False.

>>> d = {float('nan'): 1, float('nan'): 2, float('nan'): 3}
>>> d
{nan: 1, nan: 2, nan: 3}
>>> d[float('nan')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: nan

Note the existence of objects with NaN-like properties that are not float classes. In particular, pd.NaT andnp.datetime64 ("NaT")are different classes.

decimal.Decimal('nan')
pd.NaT
np.datetime64("NaT")

# >>> type(decimal.Decimal('nan'))
# <class 'decimal.Decimal'>

# >>> type(pd.NaT)
# <class 'pandas._libs.tslibs.nattype.NaTType'>

# >>> type(np.datetime64("NaT"))
# <class 'numpy.datetime64'>

Therefore, the following precautions are required when using np.isnat.

>>> np.isnat(pd.NaT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnat' is only defined for datetime and timedelta.

>>> np.isnat(np.datetime64("NaT"))
True

NaN check

math.isnan
np.isnan
pd.isna

The actual situation of math.isnan is around here. https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Include/pymath.h#L88-L103 https://github.com/python/cpython/blob/34fd4c20198dea6ab2fe8dc6d32d744d9bde868d/Lib/_pydecimal.py#L713-L726

/* Py_IS_NAN(X)
 * Return 1 if float or double arg is a NaN, else 0.
 * Caution:
 *     X is evaluated more than once.
 *     This may not work on all platforms.  Each platform has *some*
 *     way to spell this, though -- override in pyconfig.h if you have
 *     a platform where it doesn't work.
 * Note: PC/pyconfig.h defines Py_IS_NAN as _isnan
 */
#ifndef Py_IS_NAN
#if defined HAVE_DECL_ISNAN && HAVE_DECL_ISNAN == 1
#define Py_IS_NAN(X) isnan(X)
#else
#define Py_IS_NAN(X) ((X) != (X))
#endif
#endif
def _isnan(self):
    """Returns whether the number is not actually one.
    0 if a number
    1 if NaN
    2 if sNaN
    """
    if self._is_special:
        exp = self._exp
        if exp == 'n':
            return 1
        elif exp == 'N':
            return 2
    return 0

Note that pandas' ʻisna method (and also ʻisnull) returns True as missing values for None and pd.NaT as well as float nan. If pandas.options.mode.use_inf_as_na = True, there is a tip that np.inf is also judged as a missing value.

>>> pd.isna(math.nan)
True
>>> pd.isna(None)
True
>>> pd.isna(math.inf)
False
>>> pandas.options.mode.use_inf_as_na = True
>>> pd.isna(math.inf)
True

About pandas method

The direct method of pandas object takes scalar or array-like as an argument, and the return value is a bool of the same size as the argument. On the other hand, the direct method of pd.DataFrame is DataFrame for both arguments and return value.

pd.isna # for scalar or array-like
pd.DataFrame.isna # for DataFrame

The array-like object specifically refers to the following object. (https://github.com/pandas-dev/pandas/blob/v0.25.3/pandas/core/dtypes/missing.py#L136-L147)

ABCSeries,
np.ndarray,
ABCIndexClass,
ABCExtensionArray,
ABCDatetimeArray,
ABCTimedeltaArray,

It should be noted, may be either for pd.isna and pd.isnull is exactly the same (use unified from the point of view of readability is desirable).

# https://github.com/pandas-dev/pandas/blob/v0.25.3/pandas/core/dtypes/missing.py#L125
>>> id(pd.isnull)
4770964688
>>> id(pd.isna)
4770964688

is method summary

If you don't want to encounter an unexpected error, pd.isna is safe, but be careful as it will leak to the judgment ofDecimal ('nan').

math.nan decimal.Decimal('nan') np.datetime64("NaT") pd.NaT math.inf None
math.isnan True True error error False error
decimal.Decimal.is_nan error True error error error error
np.isnan True error True error False error
pd.isna True False True True False True
np.isnat error error True error error error

Other

Check the binary expression. You can see that it is quiet NaN.

>>> import struct
>>> xs = struct.pack('>d', math.nan)
>>> xs
b'\x7f\xf8\x00\x00\x00\x00\x00\x00'
>>> xn = struct.unpack('>Q', xs)[0]
>>> xn
9221120237041090560
>>> bin(xn)
'0b111111111111000000000000000000000000000000000000000000000000000'

Summary (repost)

--NaN in Python follows IEEE754 NaN, but there are some addictive points. --Note the existence of Decimal ('nan'), pd.NaT,numpy.datetime64 ('NaT'), which are not float nan. --The nan object and math.nan that can be called from numpy and pandas modules are the same. You can use any of them. (But it is better to unify from the viewpoint of readability) --Note that pandas' ʻisna (...) method returns True not only for nan but also for None, NaT, etc. as missing values. --Pandas missing values are introduced in pd.NAfrom pandas 1.0.0. It will be used in the future to usepd.NA` instead of nan as the missing value. I will write about this in another article.

Finally If you love this kind of maniac story, please come visit us at justInCase. https://www.wantedly.com/companies/justincase

that's all

Verification environment

$ uname -a
Darwin MacBook-Pro-3.local 18.7.0 Darwin Kernel Version 18.7.0: Sat Oct 12 00:02:19 PDT 2019; root:xnu-4903.278.12~1/RELEASE_X86_64 x86_64

$ python
Python 3.7.4 (default, Nov 17 2019, 08:06:12) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin

$ pip list | grep -e numpy -e pandas
numpy                    1.18.0     
pandas                   0.25.3   

Recommended Posts

What is NaN? NaN Zoya (Python) (394 days late)
What is python
What is Python
[Python] What is Pipeline ...
[Python] What is virtualenv
[Python] Python and security-① What is Python?
[Python] * args ** What is kwrgs?
What is a python map?
Python Basic Course (1 What is Python)
What is Python? What is it used for?
[Python] What is @? (About the decorator)
[python] What is the sorted key?
Python for statement ~ What is iterable ~
What is the python underscore (_) for?
Python> What is an extended slice?
[Python] What is pandas Series and DataFrame?
[Python] What is inherited by multiple inheritance?
What kind of programming language is Python?
Python learning basics ~ What is type conversion? ~
What is "mahjong" in the Python library? ??
What is a dog? Python installation volume
What is namespace
[What is an algorithm? Introduction to Search Algorithm] ~ Python ~
What is copy.copy ()
Python nan check
What is "functional programming" and "object-oriented" in Python?
Python is easy
What is Django? .. ..
What is dotenv?
What is POSIX?
What is wheezy in the Docker Python image?
What is Linux
What is klass?
I tried Python! ] I graduated today from "What is Python! Python!"!
What is SALOME?
What is Linux?
What are you comparing with Python is and ==?
What is hyperopt?
Python is instance
What is Linux
[Introduction to Udemy Python 3 + Application] 54. What is Docstrings?
What is pyvenv
What is __call__
What is Linux
Tell me what a conformal map is, Python!
[Ruby / Python / Java / Swift / JS] What is an algorithm?
[Python] What is energy data? Calender Heatmap [Plotly] Memo
Basics of Python learning ~ What is a string literal? ~
What is God? Make a simple chatbot with python
python int is infinite
What is a distribution?
What is Piotroski's F-Score?
What is Raspberry Pi?
What is Calmar Ratio?
What is a terminal?
[PyTorch Tutorial ①] What is PyTorch?
What is hyperparameter tuning?
What is a hacker?
What is JSON? .. [Note]
What is Linux for?
What is a pointer?