[PYTHON] sort warning in the pd.concat function

If you want to join vertically, put sort, and for now, sort the columns.

In conclusion, if you want to vertically pd.concat two data frames with different columns or different column order, you must put sort = True or sort = False. Otherwise, the following warning will be issued.

pd.concat([df_1, df_2])

=============================================
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  = pd.concat([df_1, df_2])

Practical example: When only the column order is different

What's wrong after all? At first, I didn't get a lot of images, so I'd like to give a simple concrete example here. Prepare two data frames, df_1 and df_2.

df_1 = pd.DataFrame({"b": ["kiwi", "avocado", "durian"],
                     "a": ["NY", "CA", "Seattle"]
                    })
df_2 = pd.DataFrame({"a": ["Tokyo", "Osaka", "Sapporo"],
                     "b": ["apple", "banana", "orange"]
                    })

df_1:

b a
0 kiwi NY
1 avocado CA
2 durian Seattle

df_2:

a b
0 Tokyo apple
1 Osaka banana
2 Sapporo orange

df_1 is in the order of b and a, and df_2 is in the order of a and b.

Concat with sort = False as an argument

Let's pass it first with sort = False.

concat_false = pd.concat([df_1, df_2], sort=False)

concat_false:

b a
0 kiwi NY
1 avocado CA
2 durian Seattle
0 apple Tokyo
1 banana Osaka
2 orange Sapporo

It is the same as df_1 and is lined up with columns b and a.

Concat with sort = True (same as concat without sort argument)

If sort = True is set here, it will be as follows.

concated_true = pd.concat([df_1, df_2], sort=True)

concated_true:

a b
0 NY kiwi
1 CA avocado
2 Seattle durian
0 Tokyo apple
1 Osaka banana
2 Sapporo orange

In this case, the order is columns a and b. If you don't pass the sort argument, it will (for now) assume sort = True and combine. Instead, a warning will occur.


concated = pd.concat([df_1, df_2])
#Concated and concated using the equals function_Check if true is the same
print(concated.equals(concated_true))
# True
=============================================
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.

By using the equals function, we can see that the two dfs, concated_true (sort = True) and concated (without the sort argument), are equal.

Practical example: When columns are different

It is almost the same even if the columns are different.

df_1 = pd.DataFrame({"a": ["Tokyo", "Osaka", "Sapporo"],
                     "b": ["apple", "banana", "orange"],
                     "c": [3, 2, 1],
                     "e": [2, 4, 8]})
df_2 = pd.DataFrame({"b": ["kiwi", "avocado", "durian"],
                     "c": [1, 3, 5],
                     "a": ["NY", "CA", "Seattle"],
                     "d": [2, 20, 1]})

df_1:

a b c e
0 Tokyo apple 3 2
1 Osaka banana 2 4
2 Sapporo orange 1 8

df_2:

b c a d
0 kiwi 1 NY 2
1 avocado 3 CA 20
2 durian 5 Seattle 1

The columns that are in common are columns a, b, and c. The difference is column d and column e.

concat with sort = False as an argument

concat_false = pd.concat([df_1, df_2], sort=False)
a b c e d
0 Tokyo apple 3 2.0 NaN
1 Osaka banana 2 4.0 NaN
2 Sapporo orange 1 8.0 NaN
0 NY kiwi 1 NaN 2.0
1 CA avocado 3 NaN 20.0
2 Seattle durian 5 NaN 1.0

Looking at the columns, they are not in alphabetical order: a, b, c, e, d. It is the column a, b, c, e of df_1 with the d column of df_2 attached from the right.

Concat without sort argument (same as concat with sort = True)

If you concat these two dataframes without sort, you get:

concat = pd.concat([df_1, df_2])

=============================================
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.

concat:

a b c d e
0 Tokyo apple 3 NaN 2.0
1 Osaka banana 2 NaN 4.0
2 Sapporo orange 1 NaN 8.0
0 NY kiwi 1 2.0 NaN
1 CA avocado 3 20.0 NaN
2 Seattle durian 5 1.0 NaN

This is in alphabetical order as a, b, c, d, e. Nothing has changed regarding the content of the data.

This has the same result as doing sort = True.

concat_true = pd.concat([df_1, df_2], sort=True)
# concat_Check if true and concat are the same
concat_true.equals(concat)
# True

# concat_true and concat_Check if false is the same
concat_false.equals(concat_true)
# False

concat_true:

a b c d e
0 Tokyo apple 3 NaN 2.0
1 Osaka banana 2 NaN 4.0
2 Sapporo orange 1 NaN 8.0
0 NY kiwi 1 2.0 NaN
1 CA avocado 3 20.0 NaN
2 Seattle durian 5 1.0 NaN

Finally: If the warning is noisy, why not sort = True for the time being?

This pandas concat warning doesn't hurt if you leave it alone, but it's moyamoya. It doesn't affect the data itself, only the order of the columns matters, so sort = True may be fine for the sake of readability.

The reference stackoverflow is as follows.

https://stackoverflow.com/questions/50501787/python-pandas-user-warning-sorting-because-non-concatenation-axis-is-not-aligne

Recommended Posts

sort warning in the pd.concat function
OR the List in Python (zip function)
I wrote the selection sort in C
Duality in function
The _authenticate_with_backend function was obsolete in django auth.autenticate
[Python] Sort the list of pathlib.Path in natural sort
Get the caller of a function in Python
[Neta] Thread-safe Sleep Sort function in Python (threading)
Fix the argument of the function used in map
I implemented the inverse gamma function in python
Execute function in parallel
Sort in Python. Next, let's think about the algorithm.
Generator function in JavaScript
Bubble sort in Python
The first GOLD "Function"
About the Unfold function
Custom sort in Python3
What does the last () in a function mean in Python?
Sort and output the elements in the list as elements and multiples in Python.
Have the equation graph of the linear function drawn in Python
Implemented the algorithm of "Algorithm Picture Book" in Python3 (Bubble Sort)
Sort the string array in order of length & Japanese syllabary
In omegaconf, let's pass the direct parameter file to the function
I tried to implement the mail sending function in Python
Make the function of drawing Japanese fonts in OpenCV general
Implemented the algorithm of "Algorithm Picture Book" in Python3 (selection sort)
Covector to think in function
Regarding the activation function Gelu
Create a function in Python
Download the file in Python
Use callback function in Python
What is the activation function?
Find the difference in Python
Naturally sort Path in Python
ntile (decile) function in python
Implement timer function in pygame
About the enumerate function (python)
python in mongodb in descending sort
Methods available in the list
Turn off the brew warning
Nonlinear function modeling in Python
Draw implicit function in python
Immediate function in python (lie)
What is the Callback function?
Sort by date in python
Try transcribing the probability mass function of the binomial distribution in Python
A function that measures the processing time of a method in python
Sort the file names obtained by Python glob in numerical order
Create a function to get the contents of the database in Go
Attempt to extend a function in the library (add copy function to pathlib)
How to use the render function defined in .mako (.html) directly in mako