[PYTHON] Use pyOCR to convert the description on the card into text

Purpose

It is a memorandum when converting the explanation on the card into text using pyOCR.

Preparation

--Install pyOCR, Tesseract, jpn.traineddata

Convert image data to text with pyOCR in Mac environment PyOCR installation

$ sudo pip install pyocr

>
 > Install Tesseract, the OCR engine section
>
>```
$ brew install tesseract
$ ls /usr/local/Cellar/tesseract/4.1.0/share/tessdata/

Get jpn.traineddata

$ wget https://github.com/tesseract-ocr/tessdata/raw/4.00/jpn.traineddata $ mv jpn.traineddata /usr/local/Cellar/tesseract/4.1.0/share/tessdata/


 --Card

 (Example) We will use the following cards.
 Quote: [Yu-Gi-Oh! Duel Links Strategy](https://www.google.com/url?sa=i&url=https%3A%2F%2Fkamigame.jp%2F%25E9%2581%258A%25E6%2588%25AF%25E7 % 258E% 258B% 25E3% 2583% 2587% 25E3% 2583% 25A5% 25E3% 2582% 25A8% 25E3% 2583% 25AB% 25E3% 2583% 25AA% 25E3% 2583% 25B3% 25E3% 2582% 25AF% 25E3% 2582 % 25B9% 2F% 25E3% 2582% 25AB% 25E3% 2583% 25BC% 25E3% 2583% 2589% 2F% 25E9% 259D% 2592% 25E7% 259C% 25BC% 25E3% 2581% 25AE% 25E7% 2599% 25BD% 25E9 % 25BE% 258D.html & psig = AOvVaw3wIPO8FpnvpxrtFSCtCIN2 & ust = 1587370256244000 & source = images & cd = vfe & ved = 0CA0QjhxqFwoTCPiXxpiF9OgCFQAAAAAdAAAAABAD)
 ![card1.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/344880/8f50d346-efbd-5f03-0021-4e59f56d5df1.png)


# code


#### **`sample.py`**
```py

from PIL import Image
import sys
sys.path.append('/path/to/dir')

import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))

langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))

txt = tool.image_to_string(
    Image.open('card.png'),
    lang='jpn',
    builder=pyocr.builders.TextBuilder()
)
print(txt)

test

$ python sample.py
Will use tool 'Tesseract (sh)'
Available languages: eng, jpn, jpn_vert, osd, snum
An invading E-Lagon that boasts a prosperous attack. What kind of grandchildren
The monument, the anchor of the anchor, is immeasurable.

Remarks

How can it be detected with higher accuracy?

Coding Error measures

problem: OCR result is not output Countermeasure: The characters in the image may be too small to analyze. It was possible to analyze by increasing the image size. (Alternatively, if there is any other necessary preprocessing, it should be executed. Example: Background noise removal)

reference

[Yu-Gi-Oh! Duel Links Strategy](https://www.google.com/url?sa=i&url=https%3A%2F%2Fkamigame.jp%2F%25E9%2581%258A%25E6%2588%25AF%25E7%258E % 258B% 25E3% 2583% 2587% 25E3% 2583% 25A5% 25E3% 2582% 25A8% 25E3% 2583% 25AB% 25E3% 2583% 25AA% 25E3% 2583% 25B3% 25E3% 2582% 25AF% 25E3% 2582% 25B9 % 2F% 25E3% 2582% 25AB% 25E3% 2583% 25BC% 25E3% 2583% 2589% 2F% 25E9% 259D% 2592% 25E7% 259C% 25BC% 25E3% 2581% 25AE% 25E7% 2599% 25BD% 25E9% 25BE % 258D.html & psig = AOvVaw3wIPO8FpnvpxrtFSCtCIN2 & ust = 1587370256244000 & source = images & cd = vfe & ved = 0CA0QjhxqFwoTCPiXxpiF9OgCFQAAAAAdAAAAABAD) Convert image data to text with pyOCR in Mac environment How to execute OCR in Python

Recommended Posts

Use pyOCR to convert the description on the card into text
Convert XLSX to CSV on the command line
How to use the generator
Convert HTML to text file
[Hyperledger Iroha] Notes on how to use the Python SDK
Notes on how to use marshmallow in the schema library
How to use the decorator
After calling the Shell file on Python, convert CSV to Parquet.
Use python on Raspberry Pi 3 to illuminate the LED (Hello World)
Download Google logo → Convert to text with OCR → Display on HTML
How to use Jupyter on the front end of supercomputer ITO
How to use the zip function
How to use the optparse module
How to use Dataiku on Windows
Convert wma to mp3 on Mac
Notes on how to use pywinauto
Notes on how to use featuretools
How to use homebrew on Debian
Convert from pdf to txt 2 [pyocr]
Notes on how to use doctest
Sakura Use Python on the Internet
How to use the ConfigParser module
Use python on Raspberry Pi 3 to light the LED with switch control!
Semi-automatically generate a description of the package to be registered on PyPI
[C language] How to use the crypt function on Linux [Password hashing]
How to use the Spark ML pipeline
Use The Metabolic Disassembler on Google Colaboratory
How to use mecab, neologd-ipadic on colab
[Linux] How to use the echo command
I want to use Linux on mac
How to use the Linux grep command
Use pandas to convert grid data to row-holding (?) Data
How to use Google Assistant on Windows 10
Use numpy's .flatten () [0] to retrieve the value
Use the Grove sensor on the Raspberry Pi
Memorandum on how to use gremlin python
Convert PDF attached to email to text format
Install python on xserver to use pip
How to use the IPython debugger (ipdb)
Add lines and text on the image
3 best ways to use the less command
I want to use Python in the environment of pyenv + pipenv on Windows 10
Script to use multiple github accounts properly in the same repository on the same machine
I tried to use Resultoon on Mac + AVT-C875, but I was frustrated on the way.
Use dHash to locate on the course from a scene in a racing game
I want to convert horizontal text to vertical text and post it on Twitter etc.