[Python, PyPDF2] A script that divides a spread PDF into two left and right

Contents of this time

This is a memo of a script that uses PyPDF2 to divide a pdf file such as a document scanned in a spread into two left and right.

Source code

from PyPDF2 import PdfFileWriter, PdfFileReader
import copy


def split_pdf(input_pdf,  output_filename):
    pdf_file = PdfFileReader(open(input_pdf, 'rb'))
    marged = PdfFileWriter()
    page_num = pdf_file.getNumPages()

    for i in range(0, page_num):
        page_L = pdf_file.getPage(i)
        page_R = copy.copy(page_L)

        (w, h) = page_L.mediaBox.upperRight

        page_L.mediaBox.upperRight = (w/2, h)
        marged.addPage(page_L)

        page_R.mediaBox.upperLeft = (w/2, h)
        marged.addPage(page_R)

    outputStream = open(output_filename, "wb")
    marged.write(outputStream)
    outputStream.close()


def sample():
    input_pdf = "input.pdf"
    output_filename = "output.pdf"
    split_pdf(input_pdf, output_filename)


if __name__ == "__main__":
    sample()

How to use

Divide the pdf file into two by giving the pdf file to be split and the file name for output to split_pdf ().

Brief commentary

Prepare two pages, one for the left page and one for the right page. (Page_L, page_R)

Since the page starts from the lower left, the width and height are obtained from the coordinates on the upper right. (w, h)

By specifying the coordinates on the right side for the left page and the coordinates on the left side for the right page, the coordinates are cut based on that.

This time, the upper right coordinate is specified as (w / 2, h) on the left page, and the upper left coordinate is specified as (w / 2, h) on the right page, and it is divided into two parts on the left and right.

It may be easier to understand if you set page_R.mediaBox.lowerLeft = (w / 2, 0) for the right page.

Example of usage

I use it as follows. Write only the additions and changes.

import sys, os

def main():
    argv = [sys.argv[i] for i in range(1,len(sys.argv))]
    for input_pdf in argv:
        basename_without_ext = os.path.splitext(os.path.basename(input_pdf))[0]
        output_filename = "split_{}.pdf".format(basename_without_ext)
        print("{}Is divided into two.".format(os.path.basename(input_pdf)))
        split_pdf(input_pdf, output_filename)
        

if __name__ == '__main__':
    main()

Get the argument, turn it with a for statement and pass it to split_pdf. The output file name will be the input file name with split_ at the beginning.

What I used as a reference

Split the PDF of the scanned center-bound material into pages (Qiita: @hrb23m)

[How to divide a spread scan PDF with misalignment into two pages (Python, pypdf) (Technical memo collection)](https://www.robotech-note.com/entry/2018/04/28/ Misalignment of center Split the spread scan PDF into two pages)

Recommended Posts

[Python, PyPDF2] A script that divides a spread PDF into two left and right
A script that combines your favorite python modules and binaries into one Lambda Layer
A function that divides iterable into N pieces in Python
A Python script that compares the contents of two directories
A python script that deletes ._DS_Store and ._ * files created on Mac
A python program that resizes a video and turns it into an image
A Python script that reads a SQL file, executes BigQuery and saves the csv
Create a web app that converts PDF to text using Flask and PyPDF2
A nice nimporter that connects nim and python
"Python Kit" that calls a Python script from Swift
I made a tool in Python that right-clicks an Excel file and divides it into files for each sheet.
A Python script that crawls RSS in Azure Status and posts it to Hipchat
A python script that imports a dated csv file into BigQuery as a time partition table
A script that combines multiple pages of a PDF file into one page without margins
[Python] A notebook that translates and downloads the ipynb file on GitHub into Japanese.
Python that merges a lot of excel into one excel
A shell script that puts Webmin into Alpine Linux
What's in that variable (when running a Python script)
A Python program that converts ical data into text
A Python script that saves a clipboard (GTK) image to a file.
Let's create a script that registers with Ideone.com in Python.
Creating a Python script that supports the e-Stat API (ver.2)
Create code that outputs "A and pretending B" in python
How to write a metaclass that supports both python2 and python3
A set of script files that do wordcloud in Python3
A python script that converts Oracle Database data to csv
I wrote a script that splits the image in two
[Python] A story that seemed to fall into a rounding trap