Introduction

I heard from my boss that the Fast API seems to be good, so I touched it. It's boring to simply make a GET request and return characters, so I created an API to convert a PDF file to a TIF image.

What is FastAPI

FastAPI is a Python web framework similar to Flask.

Development environment

Windows10 Pro
Docker for Windows

Implementation

Directory structure

root
├─app.py
├─Dockerfile
├─requirements.txt
└─test.pdf

Dockerfile

`Dockerfile`


FROM python:3.8

#Install poppler required for PDF conversion
RUN apt-get update && \
    apt-get install -y poppler-utils

#Python module installation
COPY requirements.txt .
RUN pip install --upgrade pip && \
    pip install -r requirements.txt && \
    rm requirements.txt

#Create a folder to temporarily save the converted file
RUN rm -rf /app && \
    mkdir -p /app/data/

#Place the program
COPY app.py /app/app.py

EXPOSE 8000
WORKDIR /app
CMD ["uvicorn", "app:api", "--host", "0.0.0.0", "--port", "8000"]

This time I used the image python: 3.8, but anything is fine as long as Python works and poppler can be installed.

requirements.txt

`requirements.txt:requirements.txt`


fastapi
uvicorn
python-multipart
pdf2image

fastapi and ʻuvicornare required when using FastAPI Requirespython-multipartwhen uploading files Requirespdf2image` to convert PDF files to images

app.py

`app.py`


import os
from base64 import b64encode

import uvicorn
from fastapi import FastAPI, File, UploadFile
from pdf2image import convert_from_bytes
from PIL import Image

api = FastAPI()


@api.post("/")
async def post(file: UploadFile = File(...)):
    pdf_file = await file.read()
    tif_file = convert(pdf_file)
    return tif_file


def convert(pdf_file):
    output_folder = "./data"
    file_name = "temporary"
    output_file_path = f"{output_folder}/{file_name}.tif"

    #Convert all pages of PDF to jpg and save
    image_path = convert_from_bytes(
        pdf_file=pdf_file,
        thread_count=5,
        fmt="jpg",
        output_folder=output_folder,
        output_file=file_name,
        paths_only=True,
    )

    #Load all jpg images
    images = [Image.open(image) for image in image_path]

    #Convert all jpg images to one TIF image and save
    images[0].save(
        output_file_path, format="TIFF", save_all=True, append_images=images[1:],
    )

    #Read all jpg images and base64 encode
    with open(output_file_path, "rb") as f:
        tif_file = b64encode(f.read())

    #Deletes all saved images and returns a binary of TIFF images
    for image in image_path:
        os.remove(image)
    os.remove(output_file_path)
    return tif_file


if __name__ == "__main__":
    uvicorn.run(api)

Note that if you do not set paths_only = True in convert_from_bytes, it will consume a lot of memory.

Run

Start Docker

Build
```
docker build -t fastapi .
```

Run

docker run --rm -it -p 8000:8000 fastapi

API request

 > curl -X POST -F 'file=@./test.pdf' http://localhost:8000 | base64 -di > ./test.tif
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  100  206M  100  206M  100  309k  27.0M  41409  0:00:07  0:00:07 --:--:-- 47.2M

It is base64 encoded and returned, so you need to base64 decode and write it.