Extract text from PowerPoint with Python! (Compatible with tables)

About this article

When sending the minutes of a meeting by e-mail, I sometimes wanted only the letters on the slide, so I wrote a little code. It was a little troublesome to extract the characters on the table.

Prepared PowerPoint sample file (put it in the same folder as sampleFile.pptx)

スクリーンショット 2020-12-29 195938.png

result

File name:  sampleFile.pptx 

-- Page 1 --
It's the title

Subtitle

-- Page 2 --
This is the second page.
It's a text box ♪

fruit,Greengrocer A,Super B, 
banana,100 yen,200 yen, 
Apple,150 yen,140 yen, 

Table sample

code


import pptx
from glob import glob

for fname in glob ('*.pptx'):
    print ('File name: ', fname, '\n')
    prs = pptx.Presentation(fname)

    for i, sld in enumerate(prs.slides, start=1):

        print(f'-- Page {i} --')

        for shp in sld.shapes:
            
            if shp.has_text_frame:
                print (shp.text)

            if shp.has_table:
                tbl = shp.table
                row_count = len(tbl.rows)
                col_count = len(tbl.columns)
                for r in range(0, row_count):                 
                    text=''
                    for c in range(0, col_count):
                        cell = tbl.cell(r,c)
                        paragraphs = cell.text_frame.paragraphs 
                        for paragraph in paragraphs:
                            for run in paragraph.runs:
                                text+=run.text
                            text+=', '
                    print (text)
            print ()

Extracts the text of all files with the pptx extension in the same folder.

reference

Scraping Powerpoint (pptx) table https://qiita.com/barobaro/items/a3a4a00aeda9d19e41b6

Method to extract text part from PDF / Word / PowerPoint / Excel file at once https://qiita.com/barobaro/items/a3a4a00aeda9d19e41b6

Recommended Posts

Extract text from PowerPoint with Python! (Compatible with tables)
Extract text from images in Python
Extract Japanese text from PDF with PDFMiner
[python] Extract text from pdf and read characters aloud with Open-Jtalk
Extract database tables with CSV [ODBC connection from R and python]
Extract lines that match the conditions from a text file with python
Install vim7.3 (+ python2.4) from source (compatible with Gundo.vim)
Wav file generation from numeric text with python
Extract data from a web page with Python
With skype, notify with skype from python!
Extract images and tables from pdf with python to reduce the burden of reporting
Extract template of EML file saved from Thunderbird with python3.7
[Python] Extract text data from XML data of 10GB or more.
Python: Extract file information from shared drive with Google Drive API
Extract the xz file with python
Call C from Python with DragonFFI
GOTO in Python with Sublime Text 3
Using Rstan from Python with PypeR
[Automation] Extract Outlook appointments with Python
Install Python from source with Ansible
Create folders from '01' to '12' with python
Text extraction with AWS Textract (Python3.6)
Text mining with Python ① Morphological analysis
Enable Python raw_input with Sublime Text 3
[Python] Join two tables with pandas
Run Aprili from Python with Orange
[Beginner] Extract character strings with Python
Speak Japanese text with OpenJTalk + python
Call python from nim with Nimpy
Master the type with Python [Python 3.9 compatible]
Read fbx from python with cinema4d
Extract strings from files in Python
Try to extract a character string from an image with Python3
Collecting information from Twitter with Python (Twitter API)
Receive textual data from mysql with python
Creating a simple PowerPoint file with Python
[Note] Get data from PostgreSQL with Python
Play audio files from Python with interrupts
Create wordcloud from your tweet with python3
Tweet from python with Twitter Developer + Tweepy
# 5 [python3] Extract characters from a character string
Business efficiency starting from scratch with Python
Decrypt files encrypted with openssl from python with openssl
Working with Azure CosmosDB from Python Part.2
Image acquisition from camera with Python + OpenCV
Document classification with toch text from PyTorch
[Python] Let's make matplotlib compatible with Japanese
Getting started with Dynamo from Python boto
Try calling Python from Ruby with thrift
Scraping from an authenticated site with python
Text mining with Python ② Visualization with Word Cloud
Manipulate BigQuery tables from a Python client
Use C ++ functions from python with pybind11
Load AWS-RDS / PostgreSQL tables with AWS-Lambda + Python
Read text in images with python OCR
[Python] (Line) Extract values from graph images
Collecting information from Twitter with Python (Environment construction)
Csv output from Google search with [Python]! 【Easy】
Python scraping Extract racing environment from horse racing site
Text extraction with GCP Cloud Vision API (Python3.6)
Image processing from scratch with python (5) Fourier transform