[Implementation example] Read the file line by line with Cython (Python) from the last line

When I looked it up, there weren't many unexpected cases. If the file size is small enough for the memory, you can read all + reverse with read, but since you want to prepare for a huge file, repeat the end reading in block units. Since the implementation is cython for convenience, I think that Python can be used mainly by deleting around cdef.

Correction history

test results:

Line-length\Line Feed LF CRLF
<5 o o
=5 o o
>5 o o

Note: Tested with _block_size = 5. All files consist of ASCII characters only.

python


cdef class ReversedFileReader(object):
    cdef:
        object file
        int block_size, pos
        bytes newline

    def __init__(self, str _path2file, int _block_size=4096):
        self.file = None
        self.block_size, self.pos = _block_size, 0
        self.newline = None
        if _path2file != '': self.open(_path2file)

    def open(self, str _path2file):
        cdef bytes s_tmp = b''
        if self.file is not None: self.close()
        self.file, self.newline = open(_path2file, 'rb'), None
        s_tmp = self.file.readline()
        if 0 != len(s_tmp):
            if 10 == s_tmp[len(s_tmp) - 1]:
                if 1 != len(s_tmp):
                    if 13 == s_tmp[len(s_tmp) - 2]: self.newline = b'\r\n'
                    else: self.newline = b'\n'
                else: self.newline = b'\n'
            self.file.seek(self.file.seek(-len(self.newline), 2))
            if self.file.read(len(self.newline)) == self.newline: self.pos = self.file.seek(-len(self.newline), 2)
            else: self.pos = self.file.seek(0, 2)
        return

    def close(self):
        if self.file is not None: self.file.close()
        self.file = None
        return

    def readlines(self):
        cdef:
            int tmp_pos = self.pos - self.block_size
            bytes block = b''

        lines = []
        if 0 == self.pos: return lines
        if tmp_pos < 0:
            self.file.seek(0)
            block = self.file.read(self.pos)
            lines = block.split(self.newline)
            self.pos = 0
            lines.reverse()
            return lines
        while True:
            self.file.seek(tmp_pos)
            block = self.file.read(self.block_size) + block
            lines = block.split(self.newline)
            if 0 == tmp_pos: break
            elif 1 < len(lines):
                if b'' != lines[0]: tmp_pos += len(lines[0])
                lines = lines[1:]
                break
            else:
                if 0 > tmp_pos - self.block_size:
                    self.file.seek(0)
                    block = self.file.read(tmp_pos) + block
                    lines = block.split(self.newline)
                    self.pos = 0
                    lines.reverse()
                    return lines
                else: tmp_pos = max(0, tmp_pos - self.block_size)
        self.pos = tmp_pos
        lines.reverse()
        return lines

    def __enter__(self):
        return self

    def __exit__(self, _exc_type, _exc_val, _exc_tb):
        self.close()
        return

Reference >

Recommended Posts

[Implementation example] Read the file line by line with Cython (Python) from the last line
Read line by line from a file with Python
Read the file line by line in Python
Let's read the RINEX file with Python ①
Read a file in Python with a relative path from the program
Read the file with python and delete the line breaks [Notes on reading the file]
Various ways to read the last line of a csv file in Python
Python --Read data from a numeric data file and find the multiple regression line.
Read the xml file by referring to the Python tutorial
Read QR code from image file with Python (Mac)
How to switch the configuration file to be read by Python
[Python] Read the csv file and display the figure with matplotlib
[Python] Read command line arguments from file name or stdin
Google search for the last line of the file in Python
Extract the xz file with python
Read fbx from python with cinema4d
Read the standard output of a subprocess line by line in Python
The 11th offline real-time writing reference problem. Implementation example by python.
Read CSV file with python (Download & parse CSV file)
Read the file by specifying the character code.
[Automation] Read mail (msg file) with Python
A memo organized by renaming the file names in the folder with python
Extract lines that match the conditions from a text file with python
Download the file by specifying the download destination with Python & Selemiun & Chrome (Windows version)
How to read a CSV file with Python 2/3
Let's explain the asset allocation by the Black-Litterman model (with an execution example by Python)
Make the library created by Eigen in C ++ available from Python with Boost.Numpy.
[Python] How to read excel file with pandas
Read table data in PDF file with Python
Wrap C with Cython for use from Python
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Fourier transform the wav file read by Python, reverse transform it, and write it again.
Draw a line / scatter plot on the CSV file (2 columns) with python matplotlib
Convert the character code of the file with Python3
Wav file generation from numeric text with python
I tried searching for files under the folder with Python by file name
[Scientific / technical calculation by Python] Plot, visualization, matplotlib of 2D data read from file
Learn Nim with Python (from the beginning of the year).
You can read the analog meter with the example MNIST.
I tried to touch the CSV file with Python
[Python] Specify the range from the image by dragging the mouse
Read json file with Python, format it, and output json
You can read the analog meter with the example MNIST.
Install the latest Python from pyenv installed with homebrew
Read Python csv file
[Python] Read From Stdin
Python --Read data from a numeric data file to find the covariance matrix, eigenvalues, and eigenvectors
[Python] Make the format function simpler (What is f-string? Explain the difference from the format function with an example)
Call the code generated by Cython from C / C ++ (Windows version)
Extract template of EML file saved from Thunderbird with python3.7
Save the output of GAN one by one ~ With the implementation of GAN by PyTorch ~
[Python] Open the csv file in the folder specified by pandas
I tried using the Python library from Ruby with PyCall
Why can I use the module by importing with python?
Access the file with a relative path from the execution script.
Call the code generated by Cython from C / C ++ (Mac version)
Read the GRIB2 file of the Japan Meteorological Agency with pygrib
Python> Read from a multi-line string instead of a file> io.StringIO ()
Read the VTK file and display the color map with jupyter.
[Internal_math (1)] Read with Green Coder AtCoder Library ~ Implementation in Python ~
How to read an Excel file (.xlsx) with Pandas [Python]