[PYTHON] To extract the data of a specific column in a specific sheet in multiple Excel files at once and put the data in each column in one row

background

When you want to ** batch extract ** the data of ** specific columns ** in the ** specific sheet ** of multiple Excel files and put each in one line, you can quickly process it using python. So I decided to organize it as a memorandum.

1. 1. To read multiple Excel files

--This time, as a premise, about 150 files are stored in the same folder. --First, get the list of files in the folder.

import glob
files=glob.glob(r'/For review result analysis/*.xlsx')

--Next, use ** pandas.read_excel () ** to get the file names one by one from the file list and read the ** specific sheet ** of the Excel file. --In addition, ** values ** is used to extract the data of a specific column in a specific sheet. In the example, the specific column has 10 items (from the left).

import pandas as pd
for file in files:
    df = pd.read_excel(file, sheet_name='List of points to be pointed out')
    for row in df.values:
      #Extract the data in the 10th column
      s_data = str(row[9]).strip().rstrip()

2. Post-processing

--When you want to combine the acquired data into one line --If you want to fit the acquired column data in one row, remove the line feed code in the data. This seems to be a common case in data analysis, so I will list it here. print(s_data.replace('\n',''))

--When date data is included ――When you read an Excel file with python, you may encounter an event that the date displayed on the Excel file is converted to a 5-digit number such as 41496. It seems that this is because the date in Excel starts from January 1, 1900 (1st day), and the date data is held as a numerical value obtained by adding the number of days from that. (See related article 2). ――Therefore, it is necessary to create the following function and convert it, so I will describe the code that actually worked.

def excel_date(num):
    from datetime import datetime, timedelta
    return(datetime(1899, 12, 30) + timedelta(days=num))

print(pd.to_datetime(excel_date(row[11]), format='%Y year%m month%d day'))

That's it. (If there is an easier way, please comment)

Related article

Recommended Posts

To extract the data of a specific column in a specific sheet in multiple Excel files at once and put the data in each column in one row

[Linux] Grep multiple gzip files in a directory at once

Recursively get the Excel list in a specific folder with python and write it to Excel.

Simultaneously input specific data to a specific sheet in many excels

How to insert a specific process at the start and end of spider with scrapy

I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once

How to display a specified column of files in Linux (awk)

How to copy and paste the contents of a sheet in Google Spreadsheet in JSON format (using Google Colab)

[Python] How to save the installed package and install it in a new environment at once Mac environment

How to get a specific column name and index name in pandas DataFrame

How to put a line number at the beginning of a CSV file

Convert multiple CSVs to each Excel sheet

Use python's pixivpy to download all the works of a specific user from pixiv at once (including moving)

I wanted to know the number of lines in multiple files, so I tried to get it with a command

One-liner to create a large number of test files at once on Linux

How to count the number of elements in Django and output to a template

In pandas.DataFrame, even when assigning only a specific column, if index is attached, you do not have to worry about the order of data

Convert images in multiple folders to different pdfs for each folder at once

I want to see a list of WebDAV files in the Requests module

The first step to log analysis (how to format and put log data in Pandas)

How to get a list of files in the same directory with python

[Python] I want to collect specific files scattered in each folder in one place

I made a tool in Python that right-clicks an Excel file and divides it into files for each sheet.

How to list files under the specified directory in a list (multiple conditions / subdirectory search)

[Python] Change the text color and background color of a specific keyword in print output

How to check in Python if one of the elements of a list is in another list

Output the specified table of Oracle database in Python to Excel for each file

I just wanted to extract the data of the desired date and time with Django

A solution to the problem that files containing [and] are not listed in glob.glob ()

[Information compression note 003] A plan to compress the story and board of a professor of electromagnetics at a dull university into one Jpeg.

Assign to any column in each row with np.array

Now in Singapore The story of creating a LineBot and wanting to do a memorable job

[Golang] Command to check the supported GOOS and GOARCH in a list (Check the supported platforms of the build)

I made a tool to get the answer links of OpenAI Gym all at once

Get the value of a specific key up to the specified index in the dictionary list in Python

Steps to change table and column names in your Django model at the same time

Extract only the sound of a specific instrument from a MIDI file and make it a separate file

I tried to create a Python script to get the value of a cell in Microsoft Excel

How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot

How to send a file in one shot by connecting to a host on the other side of the platform with SCP in multiple stages

Let's take a look at the infection tendency of the new coronavirus COVID-19 in each country and the medical response status (additional information).