[PYTHON] Do not change the order of columns when concatenating pandas data frames.

problem

When concatenating pandas data frames using pd.concat, the order of columns may change arbitrarily.

>>> df = pd.DataFrame([[1, 2], [3, 4]], index=[0, 1], columns=['B', 'A'])
>>> df2 = pd.DataFrame([[1, 2], [3, 4]], index=[0, 1], columns=['A', 'B'])
>>> pd.concat([df, df2])
   A  B #B A is good according to df!
0  2  1
1  4  3
0  1  2
1  3  4

Solution

To keep the column order unchanged, use the DataFrame.append method.

>>> df.append(df2)[df.columns.tolist()]
   B  A
0  1  2
1  3  4
0  2  1
1  4  3

Now you can combine the data frames while keeping the order of the df columns. If you want to combine multiple data frames, just pass a list of data frames such as df.append ([df1, df2]).

However, DataFrame.append is slow and should not be used when joining many rows.

reference

Postscript

I forgot to write the execution environment, so I added it

Recommended Posts

Do not change the order of columns when concatenating pandas data frames.
[Python] Extracts data frames that do not match a specific column with other data frames of Pandas
Change the order of PostgreSQL on Heroku
Example of what to do when the sample script does not work (OpenCV-Python)
Change the data frame of pandas purchase data (id x product) to a dictionary
In pandas.DataFrame, even when assigning only a specific column, if index is attached, you do not have to worry about the order of data
When "zipimport.ZipImportError: can't decompress data; zlib not available" appears when installing the version of pyenv
Summary of Pandas methods used when extracting data [Python]
Not being aware of the contents of the data in python
What you should not do in the process of time series data analysis (including reflection)
The websocket of toio (nodejs) and python / websocket do not connect.
A collection of methods used when aggregating data with pandas
The minimum methods to remember when aggregating data in Pandas
Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)
Change the theme of Jupyter
Change the style of matplotlib
The Power of Pandas: Python
When incrementing the value of a key that does not exist
Explaining the mechanism of Linux that you do not know unexpectedly
What to do when the jupyterlab extension settings are not reflected
What happens when I change the hyperparameters of SVM (RBF kernel)?