[PYTHON] Note that the Pandas loc specifications have changed

data = [[1,2,3],[4,5,6],[7,8,9]]
col = ['A','C','E']
df = pd.DataFrame(data, columns=col)
#   A  C  E
#0  1  2  3
#1  4  5  6
#2  7  8  9

In Pandas 0.23 and earlier, on the other hand, if you specify A, B, C in the item with loc, All item names not in the data were created as missing values.

sel_col = ['A','B','C']
print(df.loc[:,sel_col])
#version 0.Before 23
#   A   B  C
#0  1 NaN  2
#1  4 NaN  5
#2  7 NaN  8

However, since Pandas 1.0, the following error has come out. image.png

Apparently, you shouldn't specify items that aren't in the data frame. If you want to create a missing item in the data frame as before, you can use reindex instead.

sel_col = ['A','B','C']
print(df.reindex(columns=sel_col))
#   A   B  C
#0  1 NaN  2
#1  4 NaN  5
#2  7 NaN  8

Or if you want to display only the items included in the data frame, do as follows It seems that it is possible to intersect the item of the data frame and the specified item.

print(df.loc[:,df.columns.intersection(sel_col)])
#   A  C
#0  1  2
#1  4  5
#2  7  8

Recommended Posts

Note that the Pandas loc specifications have changed
The specifications of pytz have changed
Note that the latest link of ius has changed
Note that Python decorators should have wraps
Note that the Logistic Regression solver has changed its default value to lbfgs.
Note that the calculation of average pairwise correlation was very easy with pandas
Have pandas read the zip file on the web
Note that the method of publishing modules to PyPI has changed in various ways.
[Note] pandas unstack
Recursive function that displays the XML tree structure [Note]
Dynamically import / reload modules that have changed in Python
Make a note of the list of basic Pandas usage
Extract the maximum value with pandas and change that value
The attitude that programmers should have (The Zen of Python)