Try using GCP's Cloud Vision API in Java

I want to make an application that extracts characters in an image with Java, I tried using GCP's Cloud Vision API.

What is Google Cloud Vision API?

A type of machine learning service provided by Google Cloud Platform. It performs various types of image analysis.

Executing sample code

Let's move it according to Official sample. (As of April 08, 2019)

Prerequisites

--Implemented by Maven project --You must have a GCP account (required for using the API and issuing an authentication key)

Library addition

Add the VisionAPI library. Since Maven will be used this time, add the following description to pom.xml.

<dependency>
   <groupId>com.google.cloud</groupId>
   <artifactId>google-cloud-vision</artifactId>
   <version>1.49.0</version>
</dependency>

Credential settings

Authentication information is required when using the GCP API. Create credentials from the GCP console and download the JSON file

It seems that the downloaded JSON file is placed in an arbitrary path and the placed path is specified by the environment variable.

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

The setting command in Windows in the official document is set, If set with set, it will only be valid within that command prompt. It cannot be referenced from another process, and when cmd is closed, the set contents will be lost.

If you want to keep the contents set permanently It is necessary to specify it on the Windows system property screen or setx. It seems that you need to add to bash_profile even when setting on mac.

I didn't know the specification of this environment variable setting, and I was running it with eclipse, so I was a little addicted to it.

Executing sample code

Let's run the sample Detecting text in local image.

It seems that you should throw the file path and the output destination of the analysis result to detectText (String filePath, PrintStream out).

Once, try running the file path on-code.


public class ImgDetect {
    public static void main(String[] args) {
    	try {
    		//Specify the read image
    		String inputImgPath = "{Absolute path}.jpeg ";

    		//Extract the analysis result as a text file
    		PrintStream outputResultPath = new PrintStream(new FileOutputStream("{Absolute path}/Result.txt"), true);

    		detectText(inputImgPath, outputResultPath);
    	}
    	catch (FileNotFoundException e) {
    		e.printStackTrace();
    		}
    	catch (Exception e) {
    		e.printStackTrace();
    		}
    }

    //Quoted from sample code
    public static void detectText(String filePath, PrintStream out) throws Exception, IOException {
    	  List<AnnotateImageRequest> requests = new ArrayList<>();

    	  ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));

    	  Image img = Image.newBuilder().setContent(imgBytes).build();
    	  Feature feat = Feature.newBuilder().setType(Type.TEXT_DETECTION).build();
    	  AnnotateImageRequest request =
    	      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
    	  requests.add(request);

    	  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    	    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    	    List<AnnotateImageResponse> responses = response.getResponsesList();

    	    for (AnnotateImageResponse res : responses) {
    	      if (res.hasError()) {
    	        out.printf("Error: %s\n", res.getError().getMessage());
    	        return;
    	      }

    	      // For full list of available annotations, see http://g.co/cloud/vision/docs
    	      for (EntityAnnotation annotation : res.getTextAnnotationsList()) {
    	        out.printf("Text: %s\n", annotation.getDescription());
    	        out.printf("Position : %s\n", annotation.getBoundingPoly());
    	      }
    	    }
    	  }
    	}
	}

This time I tested it with the following image.

Below are the analysis results.

Text:Harumi Triton Store
TEL 03-6221-0309
佲
receipt# 16701
Store#136 terminals# 1
2019/04/08(Month) 18:20
Handler# 230011 1
#330
#330
Target amount ¥ 330 ¥ 26
#356
1, 056
¥700
M cafe latte
Subtotal
Tax 2(8%)
total(1point)
Custody
Change
Thanks
The official app of St-Marc Cafe
Download now from the QR below
Save points and save
Get a coupon!

It can be detected almost correctly! Are the following points of concern?

--The store logo at the top is not loaded. Wasn't recognized as a character? --Misrecognition: 1 person → 佲 --Misrecognition: \ → # --The detection order is unknown. Some are not in order from the top.

After that, you can also get the coordinate information of each character.

Position : vertices {
  x: 334
  y: 1100
}
vertices {
  x: 2624
  y: 1100
}
vertices {
  x: 2624
  y: 3877
}
vertices {
  x: 334
  y: 3877
}

Text:Harumi
Position : vertices {
  x: 994
  y: 1104
}
vertices {
  x: 1271
  y: 1104
}
vertices {
  x: 1271
  y: 1243
}
vertices {
  x: 994
  y: 1243
}

Text:Triton
Position : vertices {
  x: 1313
  y: 1112
}
vertices {
  x: 1834
  y: 1112
}
vertices {
  x: 1834
  y: 1243
}
vertices {
  x: 1313
  y: 1243
}

~abridgement~

This time I did it locally, With remote execution, it seems that the image to be analyzed is put in cloud storage and analyzed.

In the future, I would like to create a simple image analysis application using the Vision API.