Get text from images using OSS tess4j
Maven Copy and paste from mvnrepository to POM.xml
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.3.1</version>
</dependency>
tess4j-4.3.1.jar is downloaded
If Maven cannot be used from here
Get the Japanese recognition file (jpn.traineddata) from GitHub repository
OcrTrial.java
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
public class OcrTrial {
public static void main(String[] args) throws IOException, TesseractException {
//Load image
File file = new File("C:\\work\\INPUT.JPG");
BufferedImage img = ImageIO.read(file);
ITesseract tesseract = new Tesseract();
tesseract.setDatapath("C:\\work"); //Language file (jpn.traineddata)))
tesseract.setLanguage("jpn"); //Specify "Japanese" as the analysis language
//analysis
String str = tesseract.doOCR(img);
//result
System.out.println(str);
}
}
This is the mistake 〇 (pictogram) × (Pivot Gram)
The recognition rate seems to be high if the image can be clearly identified as characters.
-[] Try various images
Recommended Posts