[PYTHON] Do not change the order of columns when concatenating pandas data frames.

problem

When concatenating pandas data frames using pd.concat, the order of columns may change arbitrarily.

>>> df = pd.DataFrame([[1, 2], [3, 4]], index=[0, 1], columns=['B', 'A'])
>>> df2 = pd.DataFrame([[1, 2], [3, 4]], index=[0, 1], columns=['A', 'B'])
>>> pd.concat([df, df2])
   A  B #B A is good according to df!
0  2  1
1  4  3
0  1  2
1  3  4

cf: https://github.com/pandas-dev/pandas/issues/4588

Solution

To keep the column order unchanged, use the DataFrame.append method.

>>> df.append(df2)[df.columns.tolist()]
   B  A
0  1  2
1  3  4
0  2  1
1  4  3

Now you can combine the data frames while keeping the order of the df columns. If you want to combine multiple data frames, just pass a list of data frames such as df.append ([df1, df2]).

However, DataFrame.append is slow and should not be used when joining many rows.

reference

Postscript

I forgot to write the execution environment, so I added it

Python 3.6.1
pandas 0.19.2

Recommended Posts

Do not change the order of columns when concatenating pandas data frames.

[Python] Extracts data frames that do not match a specific column with other data frames of Pandas

Change the order of PostgreSQL on Heroku

Example of what to do when the sample script does not work (OpenCV-Python)

Change the data frame of pandas purchase data (id x product) to a dictionary

In pandas.DataFrame, even when assigning only a specific column, if index is attached, you do not have to worry about the order of data

When "zipimport.ZipImportError: can't decompress data; zlib not available" appears when installing the version of pyenv

Summary of Pandas methods used when extracting data [Python]

Not being aware of the contents of the data in python

What you should not do in the process of time series data analysis (including reflection)

The websocket of toio (nodejs) and python / websocket do not connect.

A collection of methods used when aggregating data with pandas

The minimum methods to remember when aggregating data in Pandas

Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)

Change the theme of Jupyter

Change the style of matplotlib

The Power of Pandas: Python

When incrementing the value of a key that does not exist

Explaining the mechanism of Linux that you do not know unexpectedly

What to do when the jupyterlab extension settings are not reflected

What happens when I change the hyperparameters of SVM (RBF kernel)?