I heard from my boss that the Fast API seems to be good, so I touched it. It's boring to simply make a GET request and return characters, so I created an API to convert a PDF file to a TIF image.
FastAPI is a Python web framework similar to Flask.
root
├─app.py
├─Dockerfile
├─requirements.txt
└─test.pdf
Dockerfile
Dockerfile
FROM python:3.8
#Install poppler required for PDF conversion
RUN apt-get update && \
    apt-get install -y poppler-utils
#Python module installation
COPY requirements.txt .
RUN pip install --upgrade pip && \
    pip install -r requirements.txt && \
    rm requirements.txt
#Create a folder to temporarily save the converted file
RUN rm -rf /app && \
    mkdir -p /app/data/
#Place the program
COPY app.py /app/app.py
EXPOSE 8000
WORKDIR /app
CMD ["uvicorn", "app:api", "--host", "0.0.0.0", "--port", "8000"]
This time I used the image python: 3.8, but anything is fine as long as Python works and poppler can be installed.
requirements.txt
requirements.txt:requirements.txt
fastapi
uvicorn
python-multipart
pdf2image
fastapi and ʻuvicornare required when using FastAPI  Requirespython-multipartwhen uploading files  Requirespdf2image` to convert PDF files to images
app.py
app.py
import os
from base64 import b64encode
import uvicorn
from fastapi import FastAPI, File, UploadFile
from pdf2image import convert_from_bytes
from PIL import Image
api = FastAPI()
@api.post("/")
async def post(file: UploadFile = File(...)):
    pdf_file = await file.read()
    tif_file = convert(pdf_file)
    return tif_file
def convert(pdf_file):
    output_folder = "./data"
    file_name = "temporary"
    output_file_path = f"{output_folder}/{file_name}.tif"
    #Convert all pages of PDF to jpg and save
    image_path = convert_from_bytes(
        pdf_file=pdf_file,
        thread_count=5,
        fmt="jpg",
        output_folder=output_folder,
        output_file=file_name,
        paths_only=True,
    )
    #Load all jpg images
    images = [Image.open(image) for image in image_path]
    #Convert all jpg images to one TIF image and save
    images[0].save(
        output_file_path, format="TIFF", save_all=True, append_images=images[1:],
    )
    #Read all jpg images and base64 encode
    with open(output_file_path, "rb") as f:
        tif_file = b64encode(f.read())
    #Deletes all saved images and returns a binary of TIFF images
    for image in image_path:
        os.remove(image)
    os.remove(output_file_path)
    return tif_file
if __name__ == "__main__":
    uvicorn.run(api)
Note that if you do not set paths_only = True in convert_from_bytes, it will consume a lot of memory.
Build
docker build -t fastapi .
Run
docker run --rm -it -p 8000:8000 fastapi
 > curl -X POST -F 'file=@./test.pdf' http://localhost:8000 | base64 -di > ./test.tif
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  100  206M  100  206M  100  309k  27.0M  41409  0:00:07  0:00:07 --:--:-- 47.2M
It is base64 encoded and returned, so you need to base64 decode and write it.
I needed python-multipart to upload the file, and I had some stumbling blocks, but I found the Fast API very easy to write.
Recommended Posts