[python] Extract text from pdf and read characters aloud with Open-Jtalk

Extract PDF text Active engineers explain how to extract PDF text with Python's pdfminer [for beginners]

$pip install pdfminer.six
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

input_path = 'Extracted PDF path'
output_path = 'result.txt'

manager = PDFResourceManager()

with open(output_path, "wb") as output:
    with open(input_path, 'rb') as input:
        with TextConverter(manager, output, codec='utf-8', laparams=LAParams()) as conv:
            interpreter = PDFPageInterpreter(manager, conv)
            for page in PDFPage.get_pages(input):
                interpreter.process_page(page)

Install Open JTalk

How to manipulate voice with Python How to read text in Python Thank you for referring to the above two sites (or almost the same ...).

Rewritten version of Open JTalk to 1.11.

To read aloud more humanly, it seems good to refer to the following articles. Reading Bot had emotions

Recommended Posts

[python] Extract text from pdf and read characters aloud with Open-Jtalk
Extract Japanese text from PDF with PDFMiner
Extract text from PowerPoint with Python! (Compatible with tables)
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
Read and use Python files from Python
Extract text from images in Python
Speak Japanese text with OpenJTalk + python
Read fbx from python with cinema4d
Extract database tables with CSV [ODBC connection from R and python]
[Python] Try to recognize characters from images with OpenCV and pyocr
# 5 [python3] Extract characters from a character string
Read text in images with python OCR
Fill the string with zeros in python and count some characters from the string
Extract lines that match the conditions from a text file with python
Read JSON with Python and output as CSV
Read table data in PDF file with Python
[Python] How to read data from CIFAR-10 and CIFAR-100
[Python3] Read and write with datetime isoformat with json
Wav file generation from numeric text with python
Read line by line from a file with Python
Select PDFMiner to extract text information from PDF
Extract data from a web page with Python
Extract "current date only" and "current date and time" with python datetime.
python text aloud (pyttsx3)
Read and analyze arff format dataset with python scipy.io
[Python beginner] Extract prefectures and cities from addresses (3 lines).
Extract components and callbacks from app.py with plotly Dash
Read QR code from image file with Python (Mac)
Segfault Python with 33 characters
Read and write files with Slackbot ~ Bot development with Python ~
Get mail from Gmail and label it with Python3
Read json file with Python, format it, and output json
[Python] Extract only numbers from lists and character strings
[Python] Read From Stdin
Text extraction (Read API) with Azure Computer Vision API (Python3.6)
OpenJTalk on Windows10 (Speak Japanese with Python from environment construction)
Extract template of EML file saved from Thunderbird with python3.7
[Python] Read the csv file and display the figure with matplotlib
[Python] Extract text data from XML data of 10GB or more.
Hash with python and escape from a certain minister's egosa
Collecting information from Twitter with Python (MySQL and Python work together)
Python: Extract file information from shared drive with Google Drive API
Programming with Python and Tkinter
Encryption and decryption with Python
Read csv with python pandas
Python and hardware-Using RS232C with Python-
OCR from PDF in Python
Integrate PDF files with Python
python with pyenv and venv
With skype, notify with skype from python!
Works with Python and R
Read json data with python
Extract bigquery dataset and table list with python and output as CSV
Operate Jupyter with REST API to extract and save Python code
Speed comparison of Wiktionary full text processing with F # and Python
Try to extract a character string from an image with Python3
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
Read a file in Python with a relative path from the program
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 2
Operate Firefox with Selenium from python and save the screen capture