Transpose CSV file in Python Part 2: Performance measurement

Performance measurement

Continuation of Last story I compared the performance between using numpy.array.T, which I tried unpleasantly, and using the zip that received comments.

Inputs are 1M.csv with 1000 rows and 1000 columns and 25M.csv with 5000 rows and 5000 columns.

Method 1


import sys
import numpy
def csvt_1(fnin, fnout):
    fin = open(fnin, "r")
    fout = open(fnout, "w")
    for line in numpy.array([s.strip('\n').split(',') for s in fin]).T:
        fout.write(",".join(line) + "\n")
    fin.close()
    fout.close()

Method 2


import sys
def csvt_2(fnin, fnout):
    fin = open(fnin, "r")
    fout = open(fnout, "w")
    for line in zip(*[s.strip('\n').split(',') for s in fin]):
        fout.write(','.join(line) + '\n')
    fin.close()
    fout.close()

Measurement result (% time% run in IPython) Method 1 1M.csv: about 500ms 25M.csv: about 14s Method 2 1M.csv: about 250ms 25M.csv: about 11s

I measured it several times, but it's about the same. zip wins. What is this? I wonder if zip works well for lazy evaluation. Anyway, meaningful results were obtained.

But unfortunately

The data that my colleague talked about when I tried this code is about 40GB in size. For that, it seems that this method can not be used, so I wrote an application in C # separately and solved it.

Recommended Posts

Transpose CSV file in Python Part 2: Performance measurement
Csv in python
File operations in Python
File processing in Python
File operations in Python
Read Python csv file
Speed evaluation of CSV file output in Python
Collectively register data in Firestore using csv file in Python
Use Measurement Protocol in Python
Download the file in Python
UI Automation Part 2 in Python
Download csv file with python
[Python] Open the csv file in the folder specified by pandas
File / folder path manipulation in Python
[Python] Write to csv file with Python
[Automation with python! ] Part 1: Setting file
Linebot creation & file sharing in Python
Create a binary file in Python
Python CSV file reading and writing
ORC, Parquet file operations in Python
[Automation with python! ] Part 2: File operation
Basic Linear Algebra Learned in Python (Part 1)
Exclusive control with lock file in Python
Data input / output in Python (CSV, JSON)
GUI creation in python using tkinter part 1
AM modulation and demodulation in Python Part 2
Read CSV file with python (Download & parse CSV file)
[Python] Convert csv file delimiters to tab delimiters
Big difference in ruby, python, httpd performance
Write O_SYNC file in C and Python
Read the file line by line in Python
Read the file line by line in Python
[GPS] Create a kml file in Python
[Part1] Scraping with Python → Organize to csv!
[Python] Read the specified line in the file
Various ways to read the last line of a csv file in Python
Potential Outcomes (Potential Outcomes) Causal Reasoning Notes in Python Part 1
How to read a CSV file with Python 2/3
Create a GIF file using Pillow in Python
File DL, byte value and delete in Python3
Read a file containing garbled lines in Python
From file to graph drawing in Python. Elementary elementary
[Python] Implemented automation in excel file copying work
Reading and writing CSV and JSON files in Python
I tried reading a CSV file using Python
Pharmaceutical company researchers summarized file scanning in Python
GUI (WxPython) executable file (pyInstaller) [Windows] in Python3
Create a MIDI file in Python using pretty_midi
Standard .py file used in Python trials (template)-2020
Sample to put Python Kivy in one file
Quadtree in Python --2
QGIS + Python Part 2
Python in optimization
CURL in python
Script python file
Metaprogramming in Python
Python 3.3 in Anaconda
SendKeys in Python
QGIS + Python Part 1
Redis performance measurement
Epoch in Python