[Python / Pandas] A bug occurs when trying to replace a DataFrame with `None` with` replace`

What happened

I had a bug (looked like) when I tried to replace np.nan with None using the replace method in DataFrame of pandas


Conducted at Google Colaboratory

Source code

1. Create DataFrame before replacement

Click here for DataFrame for operation check

import pandas as pd
import numpy as np

indexes = [
    datetime.datetime(2020, 1, 1, 11, 50),
    datetime.datetime(2020, 1, 1, 12, 50),
    datetime.datetime(2020, 1, 1, 12, 52),
    datetime.datetime(2020, 1, 1, 18, 50),
    datetime.datetime(2020, 1, 1, 19, 50),
    datetime.datetime(2020, 1, 1, 21, 50),
df = pd.DataFrame({
    'high': [1, np.nan, 3, np.nan, np.nan, 11],
    'close': [4, 5, 6, 7, np.nan, 2],
    'memo': ['sign', '', np.nan, 'sign2', np.nan, 'sign3'],
    'bool': [True, None, True, False, None, False],
    'stoploss': [True, None, True, False, None, False]
}, index=indexes)

->                    high   close   memo   bool	stoploss
2020-01-01 11:50:00   1.0    4.0     sign   True	True
2020-01-01 12:50:00   NaN    5.0            None	None
2020-01-01 12:52:00   3.0    6.0     NaN    True	True
2020-01-01 18:50:00   NaN    7.0     sign2  False	False
2020-01-01 19:50:00   NaN    NaN     NaN    None	None
2020-01-01 21:50:00   11.0   2.0     sign3  False	False

2. Replace method 1

Those who have bugs

df.replace(np.nan, None)
->                   high	close	memo	bool	stoploss
2020-01-01 11:50:00  1.0	4.0     sign	True	True
2020-01-01 12:50:00  1.0	5.0             True    True
2020-01-01 12:52:00  3.0	6.0             True	True
2020-01-01 18:50:00  3.0	7.0     sign2   False	False
2020-01-01 19:50:00  3.0	7.0     sign2   False   False
2020-01-01 21:50:00  11.0	2.0     sign3	False	False

...What's this! !!ヾ ノ .ÒдÓ) Noshi bang bang !! Where it was np.nan, it is not None, it is filled with the previous value (It looks like it was fillna)

3. Replace method 2

Fine? Who

df.replace({np.nan: None})
->                    high   close   memo   bool	stoploss
2020-01-01 11:50:00   1      4       sign   True	True
2020-01-01 12:50:00   None   5	            None	None
2020-01-01 12:52:00   3      6       None   True	True
2020-01-01 18:50:00   None   7       sign2  False	False
2020-01-01 19:50:00   None   None    None   None	None
2020-01-01 21:50:00   11     2       sign3  False	False

As expected (? No, I noticed that somehow, float is all integers ... It's okay (help)

... I was impatient for a moment (more than 30 minutes), but when I looked closely, the contents were still float.

tmp_df = df.replace({np.nan: None})

-> array([[1.0, 4.0, 'sign', True, True],
       [None, 5.0, '', None, None],
       [3.0, 6.0, None, True, True],
       [None, 7.0, 'sign2', False, False],
       [None, None, None, None, None],
       [11.0, 2.0, 'sign3', False, False]], dtype=object)

ε- (´∀ ` *) Hot

I have to remember how to write this ... (..) φdf.replace ({np.nan: None})

Reference material

For the time being, the official pandas documentation also mentions this. However, it took a long time to find it, so I decided to record it this time.

When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):

-Excerpt from pandas.DataFrame.replace

If it was written in Japanese, I might have noticed it a little earlier ...

Other related materials

I don't know if it's a bit related, but if you try to fill None with np.nan, another problem seems to occur.

StackOverflow : Replace None with NaN in pandas dataframe

