Consolidate a large number of CSV files in folders with python (data without header)

Introduction

--Integrate a large number of CSV files into one CSV file.

Advance preparation

--Prepare data without CSV file header. --Collect the CSV files you want to integrate in a folder. --Specify the output file name of the result of integration.

code


import csv, os
import pandas as pd

#Specify the folder containing the CSV file(1)Reference
csv_folder_path = os.path.join(".","csv_folder", "headerRemoved")

#Get a list of file names in list format
csv_files_list = os.listdir(csv_folder_path)

#Create a list to store the lines in all csv files
csv_rows=[]

#Specify the file name from the file list to be read, and csv all lines_Store in rows list.
for csv_filename in csv_files_list:
    csv_file_obj = open(os.path.join(csv_folder_path, csv_filename))
    reader_obj = csv.reader(csv_file_obj)
    for row in reader_obj:
        csv_rows.append(row)
csv_file_obj.close()

#Convert list to dataframe type.
df = pd.DataFrame(csv_rows)

#Specify the range of columns to export (0 to 44 only)(3)Export range
df = df.iloc[:,range(0,44)]

#Convert dataframe to csv and save(2)Output file name
df.to_csv(os.path.join(".","merged_file.csv"), index=False)

Commentary

--Get a list of filenames from the folder that contains the CSV file --Create a file object for each file according to the file name list, create a Reader object from it, and read lines from the file line by line. --Repeat on all files and finally collect all lines from all files in one list --Convert the list to DataFrame type before exporting. --Here, the range of columns required before exporting is specified. --Finally, write to a CSV file with the file name specified in pd.to_csv. At this time, index is not written by setting index = False.

Impressions

--By the way, since it is converted to df, you can retrieve the display of only any column. The 12th and 38th columns can be specified in this order by doing the following.

df.iloc[:,[38,12]]

――Next, I want to plot various graphs from df.

Recommended Posts

Consolidate a large number of CSV files in folders with python (data without header)
Get a list of files in a folder with python without a path
Organize a large number of files into folders
[Homology] Count the number of holes in data with Python
[Python] Read a csv file with a large data size using a generator
Use shutil to delete all folders with a small number of files
ETL processing for a large number of GTFS Realtime files (Python edition)
Plot CSV of time series data with unixtime value in Python (matplotlib)
[Python] Get the files in a folder with Python
A well-prepared record of data analysis in Python
How to get a list of files in the same directory with python
I made a program in Python that reads CSV data of FX and creates a large amount of chart images
Accelerate a large number of simple queries with MySQL
Full-width and half-width processing of CSV data in Python
[Python] Randomly generate a large number of English names
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
How to identify the element with the smallest number of characters in a Python list?
Combine multiple csv files into one csv file with python (assuming only one line of header)
Try scraping the data of COVID-19 in Tokyo with Python
Paste a large number of image files into PowerPoint [python-pptx]
Notes on handling large amounts of data with python + pandas
Get the number of specific elements in a python list
A set of script files that do wordcloud in Python3
[Python] Easy reading of serial number image files with OpenCV
[Python] Creating a GUI tool that automatically processes CSV of temperature rise data in Excel
Lambda + Python is good at restricting access with a large number of IP address lists
Number recognition in images with Python
Transpose CSV files in Python Part 1
A simple data analysis of Bitcoin provided by CoinMetrics in Python
Manipulate files and folders in Python
Handling of JSON files in Python
I made a lot of files for RDP connection with Python
Convert a large number of PDF files to text files using pdfminer
Sort large text files in Python
Handle Excel CSV files with Python
How to create a large amount of test data in MySQL? ??
Read files in parallel with Python
Law of large numbers in python
Get the number of readers of a treatise on Mendeley in Python
Get the number of searches with a regular expression. SeleniumBasic VBA Python
CSV output of pulse data with Raspberry Pi (confirm analog input with python)
Get a list of packages installed in your current environment with python
[Python] How to put any number of standard inputs in a list
Check the in-memory bytes of a floating point number float in Python
Receive a list of the results of parallel processing in Python with starmap
Code reading of faker, a library that generates test data in Python
Get additional data in LDAP with python
Data input / output in Python (CSV, JSON)
Add a Python data source with Redash
Output tree structure of files in Python
Display a list of alphabets in Python 3
Connect a large number of videos together!
Recommendation of Altair! Data visualization with Python
[Python] Get a list of folders only
Project Euler # 17 "Number of Characters" in Python
[Python] Precautions when finding the maximum and minimum values in a numpy array with a small number of elements
I wanted to know the number of lines in multiple files, so I tried to get it with a command
One-liner to create a large number of test files at once on Linux
Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]
Basic data frame operations written by beginners in a week of learning Python
Align the number of samples between classes of data for machine learning with Python