A word diary: God Excel will increase forever.

Preface

Curiously, there are still many cases in the world where Excel like the table is made directly.

name	Position	age	name	Position	age
mouse or rat	God	60	Rabbit	Manager	24
cow	Director	48	Tatsu	〃	12
Tiger	〃	36	Snake	slave	1

Mole	Lowly employee	20
bear	〃	20			Updated: Today

Table A (〃 represents cell merge. There is one blank line between the tiger and the mole) I don't need an update date, so it would be very helpful if you could do it as shown in the table below, but there are various things in the world such as printing convenience and oracles.

name	Position	age
mouse or rat	God	60
cow	Director	48
Tiger	Director	36
Mole	Lowly employee	20
bear	Lowly employee	20
Rabbit	Manager	24
Tatsu	Manager	12
Snake	slave	1

Table B

subject

Consider taking Table A, converting it into Table B, and then using it as data.

For the time being, the following is what I assembled with brute force. Very elephant in my opinion. Paon.

`First stage`


dataframe_raw = pd.read_excel("Table A.xlsx")

df_A=dataframe_raw.iloc[:,[*range(0,3)]].dropna(how="all").fillna(method='ffill').astype({'age':int})
 #0 from raw data in iloc-Take out 2 rows.[*range()]Is convenient because you don't have to write 0, 1, 2 ... etc.
 #dropna()And 0-Deleted rows with all 3 columns NaN
 #fillna()Will pull the data directly above NaN so you can fill in the blanks created in the merged cells.
 #Use astype to change the age read by float type to int type

At this point, df_A looks like the table below. The blank line and "〃" have disappeared, so Yoshi!

name	Position	age
mouse or rat	God	60
cow	Director	48
Tiger	Director	36
Mole	Lowly employee	20
bear	Lowly employee	20

`Second stage`


df_B=dataframe_raw.iloc[:,[*range(4,7)]].dropna(how="all").fillna(method='ffill')
 #Basically df_Same as A. However, normally pd_When reading with read, "For the same column name.Note that there is a distinction symbol such as 1 "
df_B.drop(df_B.tail(1).index,inplace=True)
 #Since the update date information at the end is unnecessary, it is cut off.
df_B.astype({'age.1':int})
df_B.columns=("name","Position","age")
 #As mentioned above, df_In the column of B.Since it has a 1, rename it

dataframe_comp=pd.concat([df_A,df_B])
 #Since the columns are the same, it is completed by vertically joining with concat.

The finished product is Table B written in the introduction.

Future tasks

`Where to worry`


df_B.astype({'age.1':int})
df_B.columns=("name","Position","age")

I feel that this area is very wasteful. You can't even put them in a line ... There seems to be a way to avoid specifying the int type at the time of reading and renaming the columns.

Also, ** Is there a way to handle this work itself in one shot? I suspect **. This kind of anxiety has always been around since I was taught the merge function in the past. Reference: https://qiita.com/wellwell3176/items/5e39fd44590603d13291

References

https://www.366service.com/jp/qa/ee53add5af3296037ee057e49442a821

https://www.it-swarm-ja.tech/ja/python/pandas%E3%83%87%E3%83%BC%E3%82%BF%E3%83%95%E3%83%AC%E3%83%BC%E3%83%A0%E3%81%AE%E3%83%87%E3%83%BC%E3%82%BF%E3%81%AE%E6%9C%80%E5%BE%8C%E3%81%AE%E8%A1%8C%E3%82%92%E5%89%8A%E9%99%A4%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95/1050572731/

[PYTHON] (Note) Importing Excel with the same column name