Create an Annotator that uses kuromoji with NLP4J [007]

Return to Index

Use different morphological analysis modules

NLP4J uses the morphological analysis process of the Yahoo! developer network in the standard (nlp4j-core).

Text analysis: Japanese morphological analysis-Yahoo! Developer Network https://developer.yahoo.co.jp/webapi/jlp/ma/v1/parse.html

The API of the Yahoo! Developer Network is convenient because it can be called by HTTP, but it also has the weakness of having a limited number of times. Therefore, I decided to create a library that uses kuromoji that can be used locally.

Creating an Annotator

This time, I created nlp4j-kuromoji as a sub module of the nlp4j project.

nlp4j-kuromoji https://github.com/oyahiroki/nlp4j/tree/master/nlp4j/nlp4j-kuromoji

Maven has added dependency to use kuromoji.

<!-- https://mvnrepository.com/artifact/com.atilika.kuromoji/kuromoji -->
<dependency>
 <groupId>com.atilika.kuromoji</groupId>
 <artifactId>kuromoji</artifactId>
 <version>0.9.0</version>
 <type>pom</type>
</dependency>
<dependency>
 <groupId>com.atilika.kuromoji</groupId>
 <artifactId>kuromoji-ipadic</artifactId>
 <version>0.9.0</version>
</dependency>

Class Diagram

It looks like this as a class diagram. As a morphological analysis engine, it does the same thing, so it is a sibling relationship. Once implemented, you will not be aware of the difference, so you will probably be aware of the implementation of kuromoji only this time.

SoWkIImgAStDuShBAJ39qdF9JoxDJSqhSSpBooz9BCalKh2fqTLLYFGgy4s4Y-5NwrrQb9-RdvM94EPoICrB0Ta10000.png

@startuml
nlp4j.DocumentAnnotator <|-- YJpMaAnnotator
nlp4j.DocumentAnnotator <|-- KuromojiAnnotator 
@enduml

Code

It implements the nlp4j.DocumentAnnotator interface provided by NLP4J. The keywords extracted by kuromoji are mapped to the keywords prepared by NLP4J.


package nlp4j.krmj.annotator;
import java.util.List;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import com.atilika.kuromoji.ipadic.Token;
import com.atilika.kuromoji.ipadic.Tokenizer;
import nlp4j.AbstractDocumentAnnotator;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.impl.DefaultKeyword;

/**
 * Kuromoji Annotator
 * @author Hiroki Oya
 * @since 1.2
 */
public class KuromojiAnnotator extends AbstractDocumentAnnotator implements DocumentAnnotator {
	static private final Logger logger = LogManager.getLogger(KuromojiAnnotator.class);
	@Override
	public void annotate(Document doc) throws Exception {
		Tokenizer tokenizer = new Tokenizer(); //Instance of kuromoji
		for (String target : targets) {
			Object obj = doc.getAttribute(target);
			if (obj == null || obj instanceof String == false) {
				continue;
			}
			String text = (String) obj;
			List<Token> tokens = tokenizer.tokenize(text);
			int sequence = 1;
			for (Token token : tokens) {
				logger.debug(token.getAllFeatures());
				DefaultKeyword kwd = new DefaultKeyword(); //New keywords
				kwd.setLex(token.getBaseForm());
				kwd.setStr(token.getSurface());
				kwd.setReading(token.getReading());
				kwd.setBegin(token.getPosition());
				kwd.setEnd(token.getPosition() + token.getSurface().length());
				kwd.setFacet(token.getPartOfSpeechLevel1());
				kwd.setSequence(sequence);
				doc.addKeyword(kwd);
				sequence++;
			}
		}
	}
}

You can see that there are differences between baseForm and lex even in the same "prototype", and that the terms are slightly different.

How to use

It is the same as the Yahoo! Developer Network except that the Annotator class specification is changed. You are WRAPing the natural language processing of kuromoji and the Yahoo! developer network, which are separate natural language processing.

	public void testAnnotateDocument001() throws Exception {
		//Natural text
		String text = "I went to school.";
		Document doc = new DefaultDocument();
		doc.putAttribute("text", text);
		KuromojiAnnotator annotator = new KuromojiAnnotator(); //Modules can be replaced by changing only here
		annotator.setProperty("target", "text");
		annotator.annotate(doc); // throws Exception
		System.err.println("Finished : annotation");
		for (Keyword kwd : doc.getKeywords()) {
			System.err.println(kwd);
		}
	}

result

The result is as follows. I was able to use it without being aware of the implementation of the natural language processing library.

Finished : annotation
I[sequence=1, facet=noun, lex=I, str=I, reading=I, count=-1, begin=0, end=1, correlation=0.0]
Is[sequence=2, facet=Particle, lex=Is, str=Is, reading=C, count=-1, begin=1, end=2, correlation=0.0]
school[sequence=3, facet=noun, lex=school, str=school, reading=Gakkou, count=-1, begin=2, end=4, correlation=0.0]
To[sequence=4, facet=Particle, lex=To, str=To, reading=D, count=-1, begin=4, end=5, correlation=0.0]
go[sequence=5, facet=verb, lex=go, str=To go, reading=Iki, count=-1, begin=5, end=7, correlation=0.0]
Masu[sequence=6, facet=Auxiliary verb, lex=Masu, str=Better, reading=Mashi, count=-1, begin=7, end=9, correlation=0.0]
Ta[sequence=7, facet=Auxiliary verb, lex=Ta, str=Ta, reading=Ta, count=-1, begin=9, end=10, correlation=0.0]
。 [sequence=8, facet=symbol, lex=。, str=。, reading=。, count=-1, begin=10, end=11, correlation=0.0]

Summary

With NLP4J, you can easily process natural language in Java!

Project URL

https://www.nlp4j.org/ NLP4J_N_128.png


Return to Index

Recommended Posts

Create an Annotator that uses kuromoji with NLP4J [007]
Create an immutable class with JAVA
Create an app with Spring Boot 2
Create an excel file with poi
Create an app with Spring Boot
Let's create an instance with .new yourself. .. ..
Create an infinite scroll with Infinite Scroll and kaminari
[Java] Create an executable module with Gradle
Create an or search function with Ransack.
NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J
Problems and workarounds that create an unusually large runtime with jlink in openjdk
Create an EC site with Rails5 ⑤ ~ Customer model ~
[Swift] Create an image selection UI with PhotoKit
Create an EC site with Rails 5 ⑩ ~ Create an order function ~
Create an app that uses the weather API to determine if you need an umbrella.
Create an HTTPS file server for development with ring-jetty-adapter
Create an EC site with Rails5 ⑦ ~ Address, Genre model ~
Create an EC site with Rails 5 ⑨ ~ Create a cart function ~
Create an EC site with Rails5 ④ ~ Header and footer ~
Create an E2E test environment with Docker x Cypress
Create a user with an empty password on CentOS7
Create an EC site with Rails5 ⑥ ~ seed data input ~
Create a Service with an empty model Liferay 7.0 / DXP
I made an Android application that GETs with HTTP
How to create an environment that uses Docker (WSL2) and Vagrant (VirtualBox) together on Windows