Sort huge files with python

I learned the behavior of subprocess.Popen, so don't forget it.

When dealing with huge files in python, iterators are used very often to do sequential processing so that everything is not in memory. However, the sorted () function cannot be used for large files. This is because iterators are converted to lists before processing.

If you want to sort a huge file, use sort on the Unix system side.


import subprocess
def sorted_file_generator(filename):
    proc = subprocess.Popen(['sort', filename], stdout=subprocess.PIPE)
    while True:    #Receive line by line using while
        line = proc.stdout.readline()
        if line:
            yield line.decode('utf-8').strip()    #The return value is a bytecode, so decode it
        else:
            break

This example is python3. In case of 2, the handling of bytes should be different

Recommended Posts

Sort huge files with python
Sorting image files with Python (3)
Sorting image files with Python
Integrate PDF files with Python
Reading .txt files with Python
[Python] Sort
Python # sort
Recursively unzip zip files with python
Manipulating EAGLE .brd files with Python
[Python] POST wav files with requests [POST]
Decrypt files encrypted with OpenSSL with Python 3
Sort large text files in Python
Handle Excel CSV files with Python
Read files in parallel with Python
[Python] One-liner Stalin sort with 50 characters
Improve your productivity by processing huge Excel files with Python
[AWS] Using ini files with Lambda [Python]
FizzBuzz with Python3
Scraping with Python
Play audio files from Python with interrupts
Statistics with python
Scraping with Python
Python with Go
Twilio with Python
Play with 2016-Python
Decrypt files encrypted with openssl from python with openssl
AES256 with python
Tested with Python
python starts with ()
Reading and writing JSON files with Python
Download files on the web with Python
[Easy Python] Reading Excel files with openpyxl
with syntax (Python)
Bingo with python
Zundokokiyoshi with python
Convert HEIC files to PNG files with Python
[Easy Python] Reading Excel files with pandas
Excel with Python
Microcomputer with Python
Cast with python
[Python] Get the files in a folder with Python
Algorithm learned with Python 16th: Sorting (insertion sort)
Algorithm learned with Python 15th: Sorting (selection sort)
Handle zip files with Japanese filenames in Python 3
Drag and drop local files with Selenium (Python)
Character encoding when dealing with files in Python 3
Algorithm learned with Python 17th: Sorting (bubble sort)
Download and import files with Splunk external python
Upload files to Google Drive with Lambda (Python)
Reading and writing fits files with Python (memo)
Convert multiple proto files at once with python
Read wav files with only Python standard packages
Serial communication with Python
Zip, unzip with python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Socket communication with Python
Data analysis with python 2
Scraping with Python (preparation)
Try scraping with Python.