Integrate PDF files with Python

Introduction

To study the basic grammar of Python, I wrote a program to combine multiple PDF files into one PDF file. PyPDF2 is used to combine and write PDF files.

Install PyPDF2

Install PyPDF2 using pip.

> pip install PyPDF2

import

Import PdfFileMerger to integrate the PDF files, and ʻos, glob` to automatically find the PDF files when you specify the folder.

merge.py(1)


from PyPDF2 import PdfFileMerger
import os
import glob

Initialization

Initialize for integration.

merge.py(2)


def main():
    merger = PdfFileMerger()

    merge_files = []

Specifying files to integrate

Specified end processing

First, continue specifying until m is entered, and if m is entered, finish specifying the file and proceed to the next process. The variable i is for counting the number of files.

merge.py(3)


    i = 1
    while True:
        print("Merge file or Folder ", i, " (Type 'm' to merge.) -> ", sep='', end='')
        in_file = input()

        if in_file == 'm':
            break

Specify file

If the entered path is a file name and it is a PDF file, it will be added to the file list to be merged. If the extension is not PDF, a message is output to convey it.

merge.py(4)


        elif os.path.isfile(in_file):
            ext = os.path.splitext(in_file)
            if ext == '.pdf':
                merge_files.append(in_file)
                i += 1
            else:
                print("The specified file is not a PDF file.")

Here, we first identify whether the argument is a file or a folder by ʻos.path.isfile (). Then we use ʻos.path.splitext () to identify the extension. If the specified file is a PDF file, ʻappend ()` adds it to the list of files to merge.

Specify a folder

If the entered path is a folder name, all pdf files under that directory are added to the list and the added file name is output.

merge.py(5)


        else:
            for file in glob.glob(in_file + '*.pdf'):
                merge_files.append(file)
                print("Add " + file)
                i += 1

The PDF files in the specified folder are listed by glob.glob () and added to the list to be merged in order.

Specifying the output file

Specify the name of the PDF file to output. If the extension of the specified name is not .pdf, it is converted to PDF format by adding .pdf at the end.

merge.py(6)


    print("Generated file -> ", end='')
    out_file = input()
    ext = os.path.splitext(out_file)

    if ext != '.pdf':
        out_file = out_file + '.pdf'

Again, we use ʻos.path.splitext ()` to look up the extension.

File integration / output

The file to be integrated is added to merger and then written to the specified file.

merge.py(7)


    for file in merge_files:
        merger.append(file)
    
    merger.write(out_file)
    merger.close()
    print("File merge completed!!")

Add it to the merger with ʻappend ()and then create a PDF file at the specified output destination withwrite ()`.

Finally

Please check the entire code from github → (PDF-Handler) Please note that Python is a beginner, so it may be difficult to read.

Recommended Posts

Integrate PDF files with Python
Integrate with Python
Sorting image files with Python (2)
Sort huge files with python
Sorting image files with Python (3)
Reading .txt files with Python
Recursively unzip zip files with python
Manipulating EAGLE .brd files with Python
[Python] POST wav files with requests [POST]
Decrypt files encrypted with OpenSSL with Python 3
Handle Excel CSV files with Python
Read files in parallel with Python
[AWS] Using ini files with Lambda [Python]
FizzBuzz with Python3
Scraping with Python
Play audio files from Python with interrupts
Statistics with python
Scraping with Python
Python with Go
Convert files written in python etc. to pdf with syntax highlighting
Twilio with Python
Play with 2016-Python
Decrypt files encrypted with openssl from python with openssl
AES256 with python
Tested with Python
Simplify PDF password unlock with python + bat
python starts with ()
Reading and writing JSON files with Python
Download files on the web with Python
[Easy Python] Reading Excel files with openpyxl
with syntax (Python)
Convert PDF files to PNG files with GIMP
Bingo with python
Zundokokiyoshi with python
Convert HEIC files to PNG files with Python
[Easy Python] Reading Excel files with pandas
Excel with Python
Microcomputer with Python
Cast with python
Library comparison summary to generate PDF with Python
[Python] Get the files in a folder with Python
Convert PDF to image (JPEG / PNG) with Python
[Python & Unix] Combine multiple PDF files into one.
[Automation] Extract the table in PDF with Python
Read table data in PDF file with Python
Handle zip files with Japanese filenames in Python 3
Drag and drop local files with Selenium (Python)
Character encoding when dealing with files in Python 3
Download and import files with Splunk external python
Upload files to Google Drive with Lambda (Python)
PDF files and sites useful for learning Python 3
Reading and writing fits files with Python (memo)
Convert multiple proto files at once with python
Read wav files with only Python standard packages
Serial communication with Python
Zip, unzip with python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Socket communication with Python
Data analysis with python 2