tesseract-OCR for Python [Japanese version]

tesseract-ocr for Python

First, what you want to do

I want to extract Japanese using OCR technology. The extracted Japanese will be used for various purposes.

Usage environment

MacBook Pro (13-inch, Mid 2012) Processor: 2.5 GHz Intel Core i5 Memory: 4 GB 1600 MHz DDR3 OS: OS X El Capitan (Ver.10.11.4)

Installation reference:

-tesseract-ocr (Mac version)

You can install "Tesseract" using either "MacPorts"or"Homebrew". (You can install "Tesseract" with either "MacPorts" or "Homebrew".)

1. What I used

Terminal


sudo port install tesseract
# '<langcode>'Install the package of the language you want to process in the part(English:eng,Japanese:jpn)
sudo port install tesseract-<langcode>

Terminal


brew install tesseract

2. Run

This time, I prepared an image that mixes Japanese and English.

Terminal


tesseract test.png out -l eng+jpn

result

Text output result


tesseract—ocr for Python

Introduction ヽ What you want to do

I want to extract Japanese using OCR technology.
In addition, the extracted Japanese will be used for various purposes.

Reflections

I haven't tried it in detail, so it's not something like this, Perhaps the result will change under all conditions such as "resolution" and "blank space (including margins)". If you need it someday, I'll verify it.

By the way, "English only" and "Japanese only" are quite good results.

Recommended Posts

tesseract-OCR for Python [Japanese version]
2016-10-30 else for Python3> for:
python [for myself]
PYTHON2.7 64bit version
Kernel / Python version summary for each Debian release
Japanese support for Jupyter PDF output (December 2020 version)
About Python for loops
Python basics ② for statement
About Python, for ~ (range)
Refactoring tools for Python
python for android Toolchain
Japanese output in Python
python Environmentally-friendly Japanese setting
OpenCV for Python beginners
Check version with python
Install Python (for Windows)
[Python] for statement error
Python environment for projects
Beginners can use Python for web scraping (1) Improved version
I tried pipenv and asdf for Python version control
Japanese preprocessing for machine learning
Send Japanese email with Python3
I wrote python in Japanese
About Fabric's support for Python 3
Python list, for statement, dictionary
Python for Data Analysis Chapter 4
Japanese OCR memo self version
Modern Python for intermediate users
Learning flow for Python beginners
Python: Japanese text: Morphological analysis
BigQuery integration for Python users
Python learning plan for AI learning
Set Up for Mac (Python)
Search for strings in Python
Breadth-first search / bidirectional search (Python version)
Installation procedure for Python and Ansible with a specific version
OpenCV3 installation for Python3 @macOS
Petit stray Python version output
Python code memo for yourself
[Python] xmp tag for photos
Change python version using pyenv
Techniques for sorting in Python
[Python] Japanese font setting method for matplot lib 3.2 [Official compliance]
Version upgrade of python Anaconda
pp4 (python power for anything)
Python3 environment construction (for beginners)
Japanese NLP @ janome / spaCy / Python
Roadmap for publishing Python packages
Python version does not switch
Python #function 2 for super beginners
Python template for Codeforces-manual test-
Check OpenSSL version of python 2.6
Japanese morphological analysis with Python
#python Python Japanese syntax error avoidance
3 months note for starting Python
Qt for Python app self-update
Introduction to Python (Python version APG4b)
[PyTorch] Tutorial (Japanese version) ① ~ Tensor ~
How to change Python version
100 Pandas knocks for Python beginners
Checkio's recommendation for learning Python