I want to convert a table converted to PDF in Python back to CSV

I wanted the table database as it is

The PDF format is a convenient format for passing data to people or distributing it together with other materials in reports, but it is troublesome in terms of data reusability because it has been fixed. There are many. It would be nice for me to submit a table of thousands of lines in A4 format in the report myself, but I wrote this because there was no original data when I wanted to use it at a later date and I had to extract it from PDF. It was.

How to make

Please write the code below. You also need to install a Java library called tabula separately. The Pyhon module is just that trumpet.

import tabula
import PyPDF2
import pandas as pd

FILE_PATH = "./test.pdf"

with open(FILE_PATH, mode='rb') as f:
    pages = PyPDF2.PdfFileReader(f).getNumPages()

for i in range(pages+1):
    tmp = tabula.read_pdf(FILE_PATH, pages = i, encoding = "utf-8_sig", spreadsheet=True)
    df = pd.concat([df, tmp], ignore_index=True)

df = tabula.read_pdf(FILE_PATH, lattice=True, pages = '1' )
   
df[0].to_csv("./test.csv", encoding="shift_jis")

How to use

Just run the .py file above. Have a good PDF life.

Recommended Posts

I want to convert a table converted to PDF in Python back to CSV
I want to create a window in Python
I want to embed a variable in a Python string
I want to easily implement a timeout in python
I want to write in Python! (2) Let's write a test
I want to randomly sample a file in Python
I want to work with a robot in python.
Convert markdown to PDF in Python
I want to make input () a nice complement in python
I want to print in a comprehension
I want to build a Python environment
If you want to assign csv export to a variable in python
I made a plugin to generate Markdown table from csv in Vim
I want to do Dunnett's test in Python
I wrote a code to convert quaternions to z-y-x Euler angles in Python
I want to make a game with Python
I made a CLI tool to convert images in each directory to PDF
I want to batch convert the result of "string" .split () in Python
I want to merge nested dicts in Python
I want to color a part of an Excel string in Python
I made a script in python to convert .md files to Scrapbox format
I want to write to a file with Python
I want to display the progress in Python!
I want to do a monkey patch only partially safely in Python
I want to create a priority queue that can be updated in Python (2.7)
Python program is slow! I want to speed up! In such a case ...
I want to write in Python! (1) Code format check
How to convert / restore a string with [] in python
I want to iterate a Python generator many times
I want to generate a UUID quickly (memorandum) ~ Python ~
I want to transition with a button in flask
Even in JavaScript, I want to see Python `range ()`!
[Python] Convert PDF text to CSV page by page (2/24 postscript)
I want to inherit to the back with python dataclass
[Python] I want to make a nested list a tuple
I want to write in Python! (3) Utilize the mock
How to save a table scraped by python to csv
[Python] Created a method to convert radix in 1 second
I want to use the R dataset in python
I want to run a quantum computer with Python
I want to do something in Python when I finish
I want to manipulate strings in Kotlin like Python!
I want to use a python data source in Re: Dash to get query results
I want to write a triple loop and conditional branch in one line in python
I tried to implement a one-dimensional cellular automaton in Python
I want to do something like sort uniq in Python
I want to start a lot of processes from python
I made a command to generate a table comment in Django
I tried "How to get a method decorated in Python"
I want to send a message from Python to LINE Bot
I tried to make a stopwatch using tkinter in python
I want to be able to run Python in VS Code
Convert A4 PDF to A3 every 2 pages
I want to debug with Python
I want to use a wildcard that I want to shell with Python remove
I tried to convert a Python file to EXE (Recursion error supported)
Convert files written in python etc. to pdf with syntax highlighting
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
numpy: I want to convert a single type ndarray to a structured array
I want to do a full text search with elasticsearch + python
I wrote a function to load a Git extension script in Python