NLP4J [006-031] 100 language processing knocks with NLP4J # 31 verb

Return to Index

I'll try.

31. Verb

Extract all the surface forms of the verb.

Maven

Use the version currently under development.

<dependency>
	<groupId>org.nlp4j</groupId>
	<artifactId>nlp4j-core</artifactId>
	<version>1.1.1.0-SNAPSHOT</version>
</dependency>

Text Data

In the morphological analysis used by default (Yahoo! Japan Developer Network Japanese morphological analysis), the upper limit of the request size is 900KB, and the number of times is limited, so a small text file is used.

Java Code

package nlp4j.nokku.chap4;

import java.util.List;

import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.index.DocumentIndex;
import nlp4j.index.SimpleDocumentIndex;
import nlp4j.yhoo_jp.YJpMaAnnotator;

public class Nokku31 {
	public static void main(String[] args) throws Exception {
		//Use the text file crawler provided by NLP4J
		Crawler crawler = new TextFileLineSeparatedCrawler();
		crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
		crawler.setProperty("encoding", "UTF-8");
		crawler.setProperty("target", "text");
		//Document crawl
		List<Document> docs = crawler.crawlDocuments();
		//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
		DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
		{
			// Yahoo!Annotator using Japan's morphological analysis API
			DocumentAnnotator annotator = new YJpMaAnnotator();
			pipeline.add(annotator);
		}
		//Execution of annotation processing
		pipeline.annotate(docs);
		//Use DocumentIndex to count keywords.
		SimpleDocumentIndex index = new SimpleDocumentIndex();
		//Add documentation
		index.addDocuments(docs);
		List<Keyword> kwds = index.getKeywords();
		kwds = kwds.stream() //
				.filter(o -> o.getFacet().equals("verb")) // 品詞がverb
				.collect(Collectors.toList());
		for (Keyword kwd : kwds) {
			System.err.println(kwd.getStr());
		}
	}
}

result

Born
Tsuka
Shi
Crying
start
Say
You see
listen
Say
Say
Catch
Boiled
Eat
Say
Thoughts
Loading
Lift
Shi
Ah
Calm down
You see
Say
Thoughts
Remaining
Mot
Shi
Meet
Meet
Shi
Blow
Throat
Ku
Weak
to drink
Say
Know

Summary

With NLP4J, you can easily process natural language in Java!

Project URL

https://www.nlp4j.org/ NLP4J_N_128.png


Return to Index

Recommended Posts

NLP4J [006-031] 100 language processing knocks with NLP4J # 31 verb
NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B"
NLP4J [006-033] 100 language processing knocks with NLP4J # 33 Sahen noun
NLP4J [006-032] 100 language processing with NLP4J Knock # 32 Prototype of verb
NLP4J [006-030] 100 language processing knocks with NLP4J # 30 Reading morphological analysis results
NLP4J [006-034c] 100 language processing knocks with NLP4J # 34 Try to solve "A's B" smarter (final edition)
Christmas with Processing
Introducing NLP4J-[000] Natural Language Processing Index in Java
Try debugging natural language processing on Windows. with VS Code
Getting Started with Doma-Annotation Processing
Asynchronous processing with Shoryuken + SQS
Presentation slides made with Processing