[PYTHON] Memo of how to use properly when combining pandas.DataFrame
When I combine data with pandas, I don't always know which method to use, so I've summarized "a rough comparison of join, merge, concat" and "personal usage".
API comparison
- pandas.concat
- Unlike merge and join, you can specify the join axis
- Specify the join axis with axis (0: index (default), 1: column)
- Specify the join method with join (inner, outer (default))
- pandas.merge
- Many options compared to join
- Specify the column to be the join key with on
- Specify the joining method with how (left, right, outer, inner (default))
- pandas.DataFrame.join
- Unlike merge and concat, it's a method of pandas.DataFrame instead of pandas
- Fewer options compared to merge
- Specify the column to be the join key with on
- Specify the joining method with how (left (default), right, outer, inner)
How to use personally
- If you want a relatively simple join, pandas.DataFrame.join (basically this seems to be enough)
- If you want to make a relatively complicated join, pandas.merge (when it seems difficult to join)
- If you want to insert data, pandas.concat (image to add data rather than join)
reference
Pandas User Guide "merge and join and concatenate" (Japanese translation of official documentation)
merge, join (column / index criteria) to join pandas.DataFrame
Concat, merge, join-python to join DataFrame horizontally