Do you guys do OCR (Optical Character Recognition)? With the technology to read the text information on the image, it is becoming visible in various places. In addition, it is becoming easier to use OCR technology with GCP so that ordinary people can use it.
So I was trying to read the text information in the PDF using GCP's Cloud Vision API, but I felt that the official document was a little difficult to understand (?), So I would like to summarize it here instead of a memo.
I felt that various important points were omitted in the above document, so it was a little difficult for me as a beginner.
Mac OS Mojave
Python 3.7
I don't know the bill for April yet, but I think it's probably low. I also use the free credits that I get for the first time, so I will update it as soon as I understand it.
Enable the Cloud Vision API.
Select a library from APIs and Services, search for and activate the Cloud Vision API.
Select a service account from IAM and Administration and create a new service account.
You can create a key json file from the following Create service account
.
Now you can create a key file that contains the public key and so on. You will move this key file to your working file later.
Select a browser from Storage. This will take you to the Storage Browser and click Create Bucket.
Create a new bucket and upload the pdf file you want to OCR to. My bucket name this time is ʻenvironment-engineering-pdf-bucket-1 and I uploaded
scan-001.pdf`.
We will also create another bucket to store the read text information of the pdf file. I named it ʻocr-result-bucket-qiita`.
The following three are required, so let's import them. You can also use virtualenv.
pip install google-cloud-vision
pip install google-cloud-storage
pip install protobuf
https://pypi.org/project/google-cloud-storage/ https://pypi.org/project/google-cloud-vision/ https://pypi.org/project/protobuf/
import os
import json
import re
from google.cloud import vision
from google.cloud import storage
from google.protobuf import json_format
#Please change here to your own uri as well as your own
gcs_source_uri = "gs://environment-engineering-pdf-bucket-1/scan-001.pdf"
gcs_destination_uri = "gs://ocr-result-bucket-qiita"
#Please change the bucket name here to your own
bucket_name = "ocr-result-bucket-qiita"
#Please change the key file here to your own
#Don't forget to put the JSON key file in the same directory!
credential_path = 'engaged-symbol-274611-192d61800d05.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
mime_type = 'application/pdf'
batch_size = 2
client = vision.ImageAnnotatorClient()
feature = vision.types.Feature(
type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mime_type)
gcs_destination = vision.types.GcsDestination(uri=f"{gcs_destination_uri}/")
output_config = vision.types.OutputConfig(
gcs_destination=gcs_destination, batch_size=batch_size)
async_request = vision.types.AsyncAnnotateFileRequest(
features=[feature], input_config=input_config,
output_config=output_config)
operation = client.async_batch_annotate_files(
requests=[async_request])
print('Waiting for the operation to finish.')
operation.result(timeout=180)
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
output = blob_list[0]
json_string = output.download_as_string()
response = json_format.Parse(
json_string, vision.types.AnnotateFileResponse())
# The actual response for the first page of the input file.
first_page_response = response.responses[0]
annotation = first_page_response.full_text_annotation
print(u'Full text:\n{}'.format(
annotation.text))
I tried OCR of the following image pdf.
Then, the title was displayed in the terminal as follows.
Gentosha Bunko
Chinese food in Kyoto
Naomi Kang
However, it fails on pages with the following cursive characters. Gyoza
is recognized as Kamako
, and Garlic
is missing garlic
and "".
output:
table of contents
《Kamako》
"dance"
Kashinnosu
Garlic
Bag child 4
Chapter fish(Marutamachi Nanahommatsu)|
34
Three-sided fish wing
Buan(Shimogamo)
Of sesame skin
Water 篮子
Numbers(Jodo-ji Temple)
04
Like a parent-child valve
Phoenix egg
Fuyoen(Kawaramachi Kajo)
Person
If you can output this to ʻepub` format etc., you can also convert it to mobi format and read it with kindle! I don't know how much it will cost. .. ..
Recommended Posts