[PYTHON] How to install and use Tesseract-OCR

How to install tesseract-OCR

・ Https://gammasoft.jp/blog/tesseract-ocr-install-on-windows/ -Execute tesseract-ocr-w64-setup-v5.0.0-alpha.20200223.exe -Additional script data (download): Check Japanese script and Japanese vertical script ・ Additional language data (download): Check Javanese, Japanese and Japanese (vertical)

・ Https://poppler.freedesktop.org/ ・ Download the poppler folder

Setting environment variables

・ Tesseract-OCR ・ Poppler-0.67.0 \ bin Add the above to your PATH

How to write code (OCR tool and PDF conversion)

import os
from PIL import Image
from matplotlib import pyplot as plt
import cv2
from pdf2image import convert_from_path
import pyocr
import pyocr.builders
import sys
import pandas as pd
import time
import numpy as np
import glob
import shutil
#OCR tool itself
def OCR_read(PIL_data):
    
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print("No OCR tool found")
        sys.exit(1)

    tool = tools[0]

    txt = tool.image_to_string( #Specify the OCR target, language, and options here.
            PIL_data,
            lang='jpn',
            builder=pyocr.builders.TextBuilder(tesseract_layout=6)
            )

    txt1 = txt.replace(' ','').replace('\n','').replace('|','')
    return txt1
#Convert PDF file to image
def pdftoimage(work_directory, path1):
    images = convert_from_path(path1)
    i = 0
    for image in images:
        
        
        print("Making work{}.png ...".format(i))
        image.save(work_directory +"/Output_folder/"+ "work{}.png ".format(i))

        i += 1
    imax =i
    return imax

Recommended Posts

How to install and use Tesseract-OCR
How to install and use Graphviz
How to install and use pandas_datareader [Python]
How to install Cascade detector and how to use it
How to install and configure blackbird
How to use .bash_profile and .bashrc
How to install CUDA and nvidia-driver
python: How to use locals () and globals ()
How to use Python zip and enumerate
How to use is and == in Python
How to use pandas Timestamp and date_range
How to install fabric and basic usage
How to use xml.etree.ElementTree
How to use Python-shell
How to use tf.data
How to use virtualenv
How to use Seaboan
How to use image-match
How to install Python
How to use Pandas 2
How to install pip
How to use Virtualenv
How to use numpy.vectorize
How to install archlinux
How to use pytest_report_header
How to install python
How to use partial
How to use Bio.Phylo
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to use IPython
How to install BayesOpt
How to use virtualenv
How to use Matplotlib
How to use iptables
How to use numpy
How to use TokyoTechFes2015
How to use venv
How to use dictionary {}
How to use Pyenv
How to use list []
How to use python-kabusapi
How to install Nbextensions
How to use OptParse
How to use return
How to install Prover9
How to use dotenv
How to use pyenv-virtualenv
How to use Go.mod
How to use imutils
How to use import
How to use lists, tuples, dictionaries, and sets
Introducing Sinatra-style frameworks and how to use them
[Python] How to use hash function and tuple.
[2020.8 latest] How to install Python
How to use Qt Designer
How to install Python [Windows]
How to use search sorted
[gensim] How to use Doc2Vec
python3: How to use bottle (2)