A word diary: God Excel will increase forever.
Curiously, there are still many cases in the world where Excel like the table is made directly.
name | Position | age | name | Position | age | |
---|---|---|---|---|---|---|
mouse or rat | God | 60 | Rabbit | Manager | 24 | |
cow | Director | 48 | Tatsu | 〃 | 12 | |
Tiger | 〃 | 36 | Snake | slave | 1 | |
Mole | Lowly employee | 20 | ||||
bear | 〃 | 20 | Updated: Today |
Table A (〃 represents cell merge. There is one blank line between the tiger and the mole) I don't need an update date, so it would be very helpful if you could do it as shown in the table below, but there are various things in the world such as printing convenience and oracles.
name | Position | age |
---|---|---|
mouse or rat | God | 60 |
cow | Director | 48 |
Tiger | Director | 36 |
Mole | Lowly employee | 20 |
bear | Lowly employee | 20 |
Rabbit | Manager | 24 |
Tatsu | Manager | 12 |
Snake | slave | 1 |
Table B
Consider taking Table A, converting it into Table B, and then using it as data.
For the time being, the following is what I assembled with brute force. Very elephant in my opinion. Paon.
First stage
dataframe_raw = pd.read_excel("Table A.xlsx")
df_A=dataframe_raw.iloc[:,[*range(0,3)]].dropna(how="all").fillna(method='ffill').astype({'age':int})
#0 from raw data in iloc-Take out 2 rows.[*range()]Is convenient because you don't have to write 0, 1, 2 ... etc.
#dropna()And 0-Deleted rows with all 3 columns NaN
#fillna()Will pull the data directly above NaN so you can fill in the blanks created in the merged cells.
#Use astype to change the age read by float type to int type
At this point, df_A looks like the table below. The blank line and "〃" have disappeared, so Yoshi!
name | Position | age |
---|---|---|
mouse or rat | God | 60 |
cow | Director | 48 |
Tiger | Director | 36 |
Mole | Lowly employee | 20 |
bear | Lowly employee | 20 |
Second stage
df_B=dataframe_raw.iloc[:,[*range(4,7)]].dropna(how="all").fillna(method='ffill')
#Basically df_Same as A. However, normally pd_When reading with read, "For the same column name.Note that there is a distinction symbol such as 1 "
df_B.drop(df_B.tail(1).index,inplace=True)
#Since the update date information at the end is unnecessary, it is cut off.
df_B.astype({'age.1':int})
df_B.columns=("name","Position","age")
#As mentioned above, df_In the column of B.Since it has a 1, rename it
dataframe_comp=pd.concat([df_A,df_B])
#Since the columns are the same, it is completed by vertically joining with concat.
The finished product is Table B written in the introduction.
Where to worry
df_B.astype({'age.1':int})
df_B.columns=("name","Position","age")
I feel that this area is very wasteful. You can't even put them in a line ... There seems to be a way to avoid specifying the int type at the time of reading and renaming the columns.
Also, ** Is there a way to handle this work itself in one shot? I suspect **. This kind of anxiety has always been around since I was taught the merge function in the past. Reference: https://qiita.com/wellwell3176/items/5e39fd44590603d13291
https://www.366service.com/jp/qa/ee53add5af3296037ee057e49442a821
https://www.it-swarm-ja.tech/ja/python/pandas%E3%83%87%E3%83%BC%E3%82%BF%E3%83%95%E3%83%AC%E3%83%BC%E3%83%A0%E3%81%AE%E3%83%87%E3%83%BC%E3%82%BF%E3%81%AE%E6%9C%80%E5%BE%8C%E3%81%AE%E8%A1%8C%E3%82%92%E5%89%8A%E9%99%A4%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95/1050572731/
Recommended Posts