[PYTHON] Japanische OCR Memo Selbstversion

Umgebung

AWS EC2 ubuntu16.04

Verfahren

http://qiita.com/bohemian916/items/67f22ee7aeac103dd205 Was das Manma auf dieser Seite betrifft, habe ich das von mir eingegebene Manma geschrieben, z. B. die URL.

Abhängigkeiten installieren

$ sudo apt update
$ sudo apt-get install autoconf automake libtool
$ sudo apt-get install libpng12-dev
$ sudo apt-get install libjpeg62-dev
$ sudo apt-get install libtiff5-dev #Leicht verändert
$ sudo apt-get install zlib1g-dev
$ sudo apt-get install autoconf-archive
$ sudo apt-get install libicu-dev      # (if you plan to make the training tools)
$ sudo apt-get install libpango1.0-dev # (if you plan to make the training tools)
$ sudo apt-get install libcairo2-dev   # (if you plan to make the training tools)

Installieren Sie die Bildbibliothek leptpnica

Sie können es nicht mit apt verwenden, da die Version alt ist.

$ wget http://www.leptonica.org/source/leptonica-1.74.1.tar.gz
$ gzip -dc leptonica-1.74.1.tar.gz |tar xvf -
$ cd leptonica-1.74.1/
$ ./configure
$ make
$ sudo make install

Installieren Sie tesseract

$ cd
$ git clone https://github.com/tesseract-ocr/tesseract.git
$ cd tesseract
$ ./autogen.sh
$ ./configure
$ LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
$ sudo make install
$ sudo ldconfig

#Sprachdaten
$ wget https://github.com/tesseract-ocr/tessdata/raw/master/jpn.traineddata
$ sudo mv jpn.traineddata /usr/local/share/tessdata/
$ export TESSDATA_PREFIX="/usr/local/share/tessdata/tessdata/"

Letzter Lauf

$ tesseract test.png out -l jpn