When you want to ** batch extract ** the data of ** specific columns ** in the ** specific sheet ** of multiple Excel files and put each in one line, you can quickly process it using python. So I decided to organize it as a memorandum.
--This time, as a premise, about 150 files are stored in the same folder. --First, get the list of files in the folder.
import glob
files=glob.glob(r'/For review result analysis/*.xlsx')
--Next, use ** pandas.read_excel () ** to get the file names one by one from the file list and read the ** specific sheet ** of the Excel file. --In addition, ** values ** is used to extract the data of a specific column in a specific sheet. In the example, the specific column has 10 items (from the left).
import pandas as pd
for file in files:
df = pd.read_excel(file, sheet_name='List of points to be pointed out')
for row in df.values:
#Extract the data in the 10th column
s_data = str(row[9]).strip().rstrip()
--When you want to combine the acquired data into one line
--If you want to fit the acquired column data in one row, remove the line feed code in the data. This seems to be a common case in data analysis, so I will list it here.
print(s_data.replace('\n',''))
--When date data is included ――When you read an Excel file with python, you may encounter an event that the date displayed on the Excel file is converted to a 5-digit number such as 41496. It seems that this is because the date in Excel starts from January 1, 1900 (1st day), and the date data is held as a numerical value obtained by adding the number of days from that. (See related article 2). ――Therefore, it is necessary to create the following function and convert it, so I will describe the code that actually worked.
def excel_date(num):
from datetime import datetime, timedelta
return(datetime(1899, 12, 30) + timedelta(days=num))
print(pd.to_datetime(excel_date(row[11]), format='%Y year%m month%d day'))
That's it. (If there is an easier way, please comment)
Recommended Posts