[PYTHON] (Note) Importing Excel with the same column name

A word diary: God Excel will increase forever.

Preface

Curiously, there are still many cases in the world where Excel like the table is made directly.

name Position age name Position age
mouse or rat God 60 Rabbit Manager 24
cow Director 48 Tatsu 12
Tiger 36 Snake slave 1
Mole Lowly employee 20
bear 20 Updated: Today

Table A (〃 represents cell merge. There is one blank line between the tiger and the mole) I don't need an update date, so it would be very helpful if you could do it as shown in the table below, but there are various things in the world such as printing convenience and oracles.

name Position age
mouse or rat God 60
cow Director 48
Tiger Director 36
Mole Lowly employee 20
bear Lowly employee 20
Rabbit Manager 24
Tatsu Manager 12
Snake slave 1

Table B

subject

Consider taking Table A, converting it into Table B, and then using it as data.

For the time being, the following is what I assembled with brute force. Very elephant in my opinion. Paon.

First stage


dataframe_raw = pd.read_excel("Table A.xlsx")

df_A=dataframe_raw.iloc[:,[*range(0,3)]].dropna(how="all").fillna(method='ffill').astype({'age':int})
 #0 from raw data in iloc-Take out 2 rows.[*range()]Is convenient because you don't have to write 0, 1, 2 ... etc.
 #dropna()And 0-Deleted rows with all 3 columns NaN
 #fillna()Will pull the data directly above NaN so you can fill in the blanks created in the merged cells.
 #Use astype to change the age read by float type to int type

At this point, df_A looks like the table below. The blank line and "〃" have disappeared, so Yoshi!

name Position age
mouse or rat God 60
cow Director 48
Tiger Director 36
Mole Lowly employee 20
bear Lowly employee 20

Second stage


df_B=dataframe_raw.iloc[:,[*range(4,7)]].dropna(how="all").fillna(method='ffill')
 #Basically df_Same as A. However, normally pd_When reading with read, "For the same column name.Note that there is a distinction symbol such as 1 "
df_B.drop(df_B.tail(1).index,inplace=True)
 #Since the update date information at the end is unnecessary, it is cut off.
df_B.astype({'age.1':int})
df_B.columns=("name","Position","age")
 #As mentioned above, df_In the column of B.Since it has a 1, rename it

dataframe_comp=pd.concat([df_A,df_B])
 #Since the columns are the same, it is completed by vertically joining with concat.

The finished product is Table B written in the introduction.

Future tasks

Where to worry


df_B.astype({'age.1':int})
df_B.columns=("name","Position","age")

I feel that this area is very wasteful. You can't even put them in a line ... There seems to be a way to avoid specifying the int type at the time of reading and renaming the columns.

Also, ** Is there a way to handle this work itself in one shot? I suspect **. This kind of anxiety has always been around since I was taught the merge function in the past. Reference: https://qiita.com/wellwell3176/items/5e39fd44590603d13291

References

https://www.366service.com/jp/qa/ee53add5af3296037ee057e49442a821

https://www.it-swarm-ja.tech/ja/python/pandas%E3%83%87%E3%83%BC%E3%82%BF%E3%83%95%E3%83%AC%E3%83%BC%E3%83%A0%E3%81%AE%E3%83%87%E3%83%BC%E3%82%BF%E3%81%AE%E6%9C%80%E5%BE%8C%E3%81%AE%E8%A1%8C%E3%82%92%E5%89%8A%E9%99%A4%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95/1050572731/

Recommended Posts

(Note) Importing Excel with the same column name
Load the module with the same name in another location
Automatically rename a node with the same name found while working
Memorandum (Add name only to people with the same surname in the list)
[Python] Get the variable name with str
When changing the table name with flask_migrate
Note: Prepare the environment of CmdStanPy with docker
[Note] Export the html of the site with python.
Read Excel name / cell range with Python VBA
Detect folders with the same image in ImageHash
Rewrite the name of the namespaced tag with lxml
Update the XSERVER website with Git push (Note)
58 The same castle
[Note] How to write QR code and description in the same image with python
Excel with Python
When a local variable with the same name as a global variable is defined in the function
Identify the name from the flower image with keras (tensorflow)
Python program that looks for the same file name
Note calling the CUDA Driver API with Python ctypes