[Python-pptx] Output PowerPoint font information to csv with python

background

A PowerPoint document Review pointed out that some fonts were incorrect. Check and correct one by one which fonts are different on slides of 100 pages or more. I didn't want to do it by visual inspection, and I wondered if I could do it efficiently because I might do the same work in the future.

python-pptx There's nothing you can't do with python in this day and age! After some research, I found that python-pptx can work with PowerPoint files from python. Official page: python-pptx

Image of object

I touched on the official Getting Start a little, referring to the articles of the great pioneers of Qiita. Somehow I got an image, so I will Dump it below. (If you make a mistake, please point it out ...)

Overall bird's eye view

Presentation > slides[] > shapes[] > text_frame.paragraphs[] > runs[] スライド4.PNG

Correspondence between shapes [] and slides

slide has the objects in that slide in the form of an array in shapes []. スライド5.PNG

Correspondence between text_frame.paragraphs [] and slides

Among shapes, those that can have characters (?) Can have text_frame.paragraphs []. I often get an error trying to access with shapes [n] .paragraphs [m] ... スライド6.PNG

Correspondence between runs [] and slides

You can change the font for each character, but I've always wondered how I have it. It seems that it has a unit called run, and you can set the font for each. スライド7.PNG

What I made

Dump all paragraphs and runs of all slides to csv. Set TARGET_FILE_PATH to the path to the target powerpoint file and OUTPUT_FILE_PATH to the path to the output csv file.

python-pptx.py


from pptx import Presentation
from pptx.util import Pt
import csv

TARGET_FILE_PATH = './targetFile/targetFile.pptx'
OUTPUT_FILE_PATH = './output.csv'
FONT_SIZE_DIVESER = 12700
# ["pptxFile name","slide number","object number","run number","Object type"," font.name"," font.size"," text"]
def export_slide_fonts_and_text():
  outputArray = []
  outputArray.append(["pptxFile name","slide number","object number","paragraph number","run number","Object type"," font.name"," font.size"," text"])
  
  #Read ppt file
  prs = Presentation(TARGET_FILE_PATH)

  slide_number = 0
  #Loading slides
  for slide in prs.slides:
    shape_number = 0
    #Processed by shape
    for shape in slide.shapes:
      if not shape.has_text_frame:
        shape_number = shape_number + 1
        continue

      paragraph_number = 0
      #Output once at the stage of paragraph
      for paragraph in shape.text_frame.paragraphs:
        if(paragraph.font.size != None):
          fontSize = paragraph.font.size/FONT_SIZE_DIVESER
        else:
          fontSize = None
        outputArray.append([TARGET_FILE_PATH, slide_number, shape_number,paragraph_number, "-","paragraph", str(paragraph.font.name), str(fontSize), paragraph.text])

        run_number = 0
        #Output each run
        for run in paragraph.runs:
          if(run.font.size != None):
            fontSize = run.font.size/FONT_SIZE_DIVESER
          else:
            fontSize = None
          outputArray.append([TARGET_FILE_PATH, slide_number, shape_number,paragraph_number, run_number, "run", str(run.font.name), str(fontSize), run.text])

          run_number = run_number + 1
        paragraph_number = paragraph_number + 1
      shape_number = shape_number + 1
    slide_number = slide_number +1

  #writing
  with open(OUTPUT_FILE_PATH, 'w', encoding="shift-jis") as f:
    wirter = csv.writer(f, lineterminator='\n', quoting=csv.QUOTE_ALL)
    wirter.writerows(outputArray)

if __name__ == "__main__":
  export_slide_fonts_and_text()

Process the output a little

The following is what was output by csv, pasted on Excel and colored. Generally, I feel like I can do what I want to do! image.png

ToBe

Get the default value

As you can see in Excel above, there are many places where the font name and font size are "None". If the value is not set, it will refer to the default value. I think I can get it by accessing the placeholder ... I would like to investigate a little more.

The pioneer who was allowed to refer

Thank you…. Overwhelming thanks ...! python-pptx Summary Automate reporting with python-pptx [Python] Generate report PowerPoint at explosive speed! Automatic report creation using Python [PowerPoint] [python-pptx] https://qiita.com/code_440/items/22e8539da465686496d3

Recommended Posts

[Python-pptx] Output PowerPoint font information to csv with python
Output to csv file with Python
Write to csv with Python
[Python] Write to csv file with Python
Output product information to csv using Rakuten product search API [Python]
Output color characters to pretty with python
Output Python log to console with GAE
[Part1] Scraping with Python → Organize to csv!
Python> Output numbers from 1 to 100, 501 to 600> For csv
How to read a CSV file with Python 2/3
Csv output from Google search with [Python]! 【Easy】
Scraping tabelog with python and outputting to CSV
Read JSON with Python and output as CSV
I tried to output LLVM IR with Python
Write CSV data to AWS-S3 with AWS-Lambda + Python
Python script to get note information with REAPER
Csv tinkering with python
Procedure to load MNIST with python and output to png
How to output CSV of multi-line header with pandas
How to convert JSON file to CSV file with Python Pandas
[Python] A memo to write CSV vertically with Pandas
[Python] How to rewrite the table style with python-pptx [python-pptx]
Connect to BigQuery with Python
Read csv with python pandas
Try Python output with Haxe 3.2
Connect to Wikipedia with Python
Post to slack with Python 3
Output to syslog with Loguru
Switch python to 2.7 with alternatives
Download csv file with python
Get Alembic information with Python
Try to display various information useful for debugging with python
How to output additional information when logging with python's logging module
Convert to a string while outputting standard output with Python subprocess
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Extract bigquery dataset and table list with python and output as CSV
Python: How to use async with
Link to get started with python
Make apache log csv with python
[Python] font family and font with matplotlib
Create folders from '01' to '12' with python
Nice to meet you with python
Try to operate Facebook with Python
How to import CSV and TSV files into SQLite with Python
Input / output with Python (Python learning memo ⑤)
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
Convert list to DataFrame with python
Output cell to file with Colaboratory
How to get started with Python
[Note] Hello world output with python
[python] Read information with Redmine API
Unit test log output with python
What to do with PYTHON release?
Unable to install Python with pyenv
Handle Excel CSV files with Python
[Django] Command to output QuerySet to csv
Generate PowerPoint material for "I tried to sing with XX" [python-pptx]
How to calculate date with python
Reading and writing CSV with Python
Easily post to twitter with Python 3