[PYTHON] Slightly different behavior depending on version in Pandas


Pandas is a super useful library, but when the version changes, what used to work may not work. This time, I will give two points as a case when 0.24 ⇒ 1.0.

That ① ix cannot be used.

First, at 0.24. Let's create a data frame consisting of columns'a'and'b' and specify the columns with ix.


>>>a = pd.DataFrame([[1,2],[3,4]], index=[1,2], columns=['a', 'b'])
>>> a.ix[:, ['a','b']]
   a  b
1  1  2
2  3  4

Can be used normally. Then at 1.0.


>>> a.ix[:, ['a','b']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\generic.py", l

ine 5273, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'ix'

You can't use it. But don't rush. You can use loc instead of ix.

(2) An error occurs when a non-existent column is specified in loc.

Now let's just specify the non-existent column'c'. For 0.24.


>>> a.loc[:, ['a','c']]
   a   c
1  1 NaN
2  3 NaN

It seems to automatically create a column and fill it with'NaN'. Are you too smart?

Next in 1.0.


>>> a.loc[:, ['a','c']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\indexing.py",
line 1760, in __getitem__
    return self._getitem_tuple(key)
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\indexing.py",
line 1287, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\indexing.py",
line 1952, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\indexing.py",
line 1593, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\indexing.py",
line 1550, in _get_listlike_indexer
  File "C:\ProgramData\Anaconda3\envs\padnas1\lib\site-packages\pandas\core\indexing.py",
line 1652, in _validate_read_indexer
    raise KeyError(
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported

, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-


A key error has occurred. This may be strict with this. As a response, it seems better to check in advance if the key exists in the column.


This time, I introduced two examples that happened to be found, but it seems that the behavior has changed in many other details, so it is better to check the changes comprehensively by the following. https://pandas.pydata.org/docs/whatsnew/v1.0.0.html

Recommended Posts

Slightly different behavior depending on version in Pandas
In Python, change the behavior of the method depending on how it is called
rsync Behavior changes depending on the presence or absence of the slash in the copy source
Fill outliers with NaN based on quartiles in Pandas
Difference in results depending on the argument of multiprocess.Process