Extract text from PowerPoint with Python! (Compatible with tables)

When sending the minutes of a meeting by e-mail, I sometimes wanted only the letters on the slide, so I wrote a little code. It was a little troublesome to extract the characters on the table.

Prepared PowerPoint sample file (put it in the same folder as sampleFile.pptx)

スクリーンショット 2020-12-29 195938.png


File name:  sampleFile.pptx 

-- Page 1 --
It's the title


-- Page 2 --
This is the second page.
It's a text box ♪

fruit,Greengrocer A,Super B, 
banana,100 yen,200 yen, 
Apple,150 yen,140 yen, 

Table sample


import pptx
from glob import glob

for fname in glob ('*.pptx'):
    print ('File name: ', fname, '\n')
    prs = pptx.Presentation(fname)

    for i, sld in enumerate(prs.slides, start=1):

        print(f'-- Page {i} --')

        for shp in sld.shapes:
            if shp.has_text_frame:
                print (shp.text)

            if shp.has_table:
                tbl = shp.table
                row_count = len(tbl.rows)
                col_count = len(tbl.columns)
                for r in range(0, row_count):                 
                    for c in range(0, col_count):
                        cell = tbl.cell(r,c)
                        paragraphs = cell.text_frame.paragraphs 
                        for paragraph in paragraphs:
                            for run in paragraph.runs:
                            text+=', '
                    print (text)
            print ()

Extracts the text of all files with the pptx extension in the same folder.


