update1 2020-01-25: typo fix ʻIEEE745`-> ʻIEEE 754`

```
In [1]: from datetime import datetime
In [2]: (datetime(2020, 1, 11) - datetime(2018, 12, 13)).days
Out[2]: 394
```

I will explain how to handle nan in Python. In the following, the notation of nan as a concept is referred to as NaN.

Disclaimer: This post is for justInCase Advent Calendar 2018 and was posted after a period of about 400 days, but due to the maturity period, the content Is not fulfilling.

--NaN in Python follows IEEE 754 NaN, but there are some addictive points.
--Note the existence of `Decimal ('nan')`

, `pd.NaT`

,`numpy.datetime64 ('NaT')`

, which are not float nan.
--The nan object and math.nan that can be called from numpy and pandas modules are the same. You can use any of them. (But it is better to unify from the viewpoint of readability)
--Note that pandas' ʻisna (...) `method returns True as a missing value not only for nan but also for`

None`, `

pd.NaT`, etc. --The missing value of pandas is that `

pd.NA`will be introduced from pandas 1.0.0. It is desirable to use`

pd.NA` instead of nan as the missing value in the future. I will write about this in another article.

- https://mobile.twitter.com/jorisvdbossche/status/1208476049690046465
- https://dev.pandas.io/docs/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values

[Verification environment is described at the end](#Verification environment)

Please refer to the previous article What is NaN? NaN Zoya (R).

Note that quiet NaN propagates in general numerical operations, but what do you think the following two values should return? In fact, the handling of NaN at min and max changes between IEEE 754-2008 and IEEE 754-2019. The explanation of is in another article.

```
min(1.0, float('nan'))
max(1.0, float('nan'))
```

There are no language literals. If you are calling `float ('nan')`

or numpy, which does not require a module call, `np.nan`

tends to be used a lot.

```
import math
import decimal
import numpy as np
import pandas as pd
float('nan')
math.nan
0.0 * math.inf
math.inf / math.inf
# 0.0/0.0 ZeroDivisionError in Python. C, R,Many languages, such as julia, return NaN
np.nan
np.core.numeric.NaN
pd.np.nan
```

All float objects. Objects that are not singleton objects but are referenced by `numpy`` pandas`

are the same.

```
nans = [float('nan'), math.nan, 0 * math.inf, math.inf / math.inf, np.nan, np.core.numeric.NaN, pd.np.nan]
import pprint
pprint.pprint([(type(n), id(n)) for n in nans])
# [(<class 'float'>, 4544450768),
# (<class 'float'>, 4321186672),
# (<class 'float'>, 4544450704),
# (<class 'float'>, 4544450832),
# (<class 'float'>, 4320345936),
# (<class 'float'>, 4320345936),
# (<class 'float'>, 4320345936)]
```

`float ('nan')`

itself is an immutable object of float class, so hashable. So it can be a dictionary key, but strangely it allows you to add multiple nans. And if you don't bind the key to a variable in advance, you can't retrieve it again. This is thought to be because all the results of the comparison operator of `NaN`

are` False`

, that is, `float ('nan') == float ('nan')`

-> `False`

.

```
>>> d = {float('nan'): 1, float('nan'): 2, float('nan'): 3}
>>> d
{nan: 1, nan: 2, nan: 3}
>>> d[float('nan')]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: nan
```

Note the existence of objects with NaN-like properties that are not float classes. In particular, `pd.NaT`

and`np.datetime64 ("NaT")`

are different classes.

```
decimal.Decimal('nan')
pd.NaT
np.datetime64("NaT")
# >>> type(decimal.Decimal('nan'))
# <class 'decimal.Decimal'>
# >>> type(pd.NaT)
# <class 'pandas._libs.tslibs.nattype.NaTType'>
# >>> type(np.datetime64("NaT"))
# <class 'numpy.datetime64'>
```

Therefore, the following precautions are required when using `np.isnat`

.

```
>>> np.isnat(pd.NaT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnat' is only defined for datetime and timedelta.
>>> np.isnat(np.datetime64("NaT"))
True
```

```
math.isnan
np.isnan
pd.isna
```

The actual situation of `math.isnan`

is around here.
https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Include/pymath.h#L88-L103
https://github.com/python/cpython/blob/34fd4c20198dea6ab2fe8dc6d32d744d9bde868d/Lib/_pydecimal.py#L713-L726

```
/* Py_IS_NAN(X)
* Return 1 if float or double arg is a NaN, else 0.
* Caution:
* X is evaluated more than once.
* This may not work on all platforms. Each platform has *some*
* way to spell this, though -- override in pyconfig.h if you have
* a platform where it doesn't work.
* Note: PC/pyconfig.h defines Py_IS_NAN as _isnan
*/
#ifndef Py_IS_NAN
#if defined HAVE_DECL_ISNAN && HAVE_DECL_ISNAN == 1
#define Py_IS_NAN(X) isnan(X)
#else
#define Py_IS_NAN(X) ((X) != (X))
#endif
#endif
```

```
def _isnan(self):
"""Returns whether the number is not actually one.
0 if a number
1 if NaN
2 if sNaN
"""
if self._is_special:
exp = self._exp
if exp == 'n':
return 1
elif exp == 'N':
return 2
return 0
```

Note that pandas' ʻisna` method (and also ʻisnull`

) returns `True`

as missing values for` None`

and `pd.NaT`

as well as` float nan`

.
If `pandas.options.mode.use_inf_as_na = True`

, there is a tip that` np.inf`

is also judged as a missing value.

```
>>> pd.isna(math.nan)
True
>>> pd.isna(None)
True
```

```
>>> pd.isna(math.inf)
False
>>> pandas.options.mode.use_inf_as_na = True
>>> pd.isna(math.inf)
True
```

The direct method of pandas object takes scalar or array-like as an argument, and the return value is a bool of the same size as the argument. On the other hand, the direct method of pd.DataFrame is DataFrame for both arguments and return value.

```
pd.isna # for scalar or array-like
pd.DataFrame.isna # for DataFrame
```

The array-like object specifically refers to the following object. (https://github.com/pandas-dev/pandas/blob/v0.25.3/pandas/core/dtypes/missing.py#L136-L147)

```
ABCSeries,
np.ndarray,
ABCIndexClass,
ABCExtensionArray,
ABCDatetimeArray,
ABCTimedeltaArray,
```

It should be noted, may be either for `pd.isna`

and` pd.isnull`

is exactly the same (use unified from the point of view of readability is desirable).

```
# https://github.com/pandas-dev/pandas/blob/v0.25.3/pandas/core/dtypes/missing.py#L125
>>> id(pd.isnull)
4770964688
>>> id(pd.isna)
4770964688
```

If you don't want to encounter an unexpected error, `pd.isna`

is safe, but be careful as it will leak to the judgment of`Decimal ('nan')`

.

math.nan | decimal.Decimal('nan') | np.datetime64("NaT") | pd.NaT | math.inf | None | |
---|---|---|---|---|---|---|

math.isnan | True | True | error | error | False | error |

decimal.Decimal.is_nan | error | True | error | error | error | error |

np.isnan | True | error | True | error | False | error |

pd.isna | True | False | True | True | False | True |

np.isnat | error | error | True | error | error | error |

Check the binary expression. You can see that it is quiet NaN.

```
>>> import struct
>>> xs = struct.pack('>d', math.nan)
>>> xs
b'\x7f\xf8\x00\x00\x00\x00\x00\x00'
>>> xn = struct.unpack('>Q', xs)[0]
>>> xn
9221120237041090560
>>> bin(xn)
'0b111111111111000000000000000000000000000000000000000000000000000'
```

--NaN in Python follows IEEE754 NaN, but there are some addictive points.
--Note the existence of `Decimal ('nan')`

, `pd.NaT`

,`numpy.datetime64 ('NaT')`

, which are not float nan.
--The nan object and math.nan that can be called from numpy and pandas modules are the same. You can use any of them. (But it is better to unify from the viewpoint of readability)
--Note that pandas' ʻisna (...) `method returns True not only for nan but also for None, NaT, etc. as missing values. --Pandas missing values are introduced in `

pd.NA`from pandas 1.0.0. It will be used in the future to use`

pd.NA` instead of nan as the missing value. I will write about this in another article.

- https://mobile.twitter.com/jorisvdbossche/status/1208476049690046465
- https://dev.pandas.io/docs/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values

Finally If you love this kind of maniac story, please come visit us at justInCase. https://www.wantedly.com/companies/justincase

that's all

```
$ uname -a
Darwin MacBook-Pro-3.local 18.7.0 Darwin Kernel Version 18.7.0: Sat Oct 12 00:02:19 PDT 2019; root:xnu-4903.278.12~1/RELEASE_X86_64 x86_64
$ python
Python 3.7.4 (default, Nov 17 2019, 08:06:12)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
$ pip list | grep -e numpy -e pandas
numpy 1.18.0
pandas 0.25.3
```

Recommended Posts