[PYTHON] Japanese OCR memo self version

environment

AWS EC2 ubuntu16.04

procedure

http://qiita.com/bohemian916/items/67f22ee7aeac103dd205 This page is the same, but I wrote the URL etc.

Install dependencies

$ sudo apt update
$ sudo apt-get install autoconf automake libtool
$ sudo apt-get install libpng12-dev
$ sudo apt-get install libjpeg62-dev
$ sudo apt-get install libtiff5-dev #Slightly changed
$ sudo apt-get install zlib1g-dev
$ sudo apt-get install autoconf-archive
$ sudo apt-get install libicu-dev      # (if you plan to make the training tools)
$ sudo apt-get install libpango1.0-dev # (if you plan to make the training tools)
$ sudo apt-get install libcairo2-dev   # (if you plan to make the training tools)

Install image library leptpnica

$ wget http://www.leptonica.org/source/leptonica-1.74.1.tar.gz
$ gzip -dc leptonica-1.74.1.tar.gz |tar xvf -
$ cd leptonica-1.74.1/
$ ./configure
$ make
$ sudo make install

Install tesseract

$ cd
$ git clone https://github.com/tesseract-ocr/tesseract.git
$ cd tesseract
$ ./autogen.sh
$ ./configure
$ LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
$ sudo make install
$ sudo ldconfig

#Language data
$ wget https://github.com/tesseract-ocr/tessdata/raw/master/jpn.traineddata
$ sudo mv jpn.traineddata /usr/local/share/tessdata/
$ export TESSDATA_PREFIX="/usr/local/share/tessdata/tessdata/"

Last run

$ tesseract test.png out -l jpn

Recommended Posts

Japanese OCR memo self version
[PyTorch] Tutorial (Japanese version) ② ~ AUTOGRAD ~
[PyTorch] Tutorial (Japanese version) ① ~ Tensor ~
tesseract-OCR for Python [Japanese version]
Compose file version 3 reference Japanese translation
Japanese prettyprint memo (also python2.7 & 3.5 windows7)