NLP4J [006-032] 100 language processing with NLP4J Knock # 32 Prototype of verb

Return to Index

I'll try.

32. The original form of the verb

Extract all the original forms of the verb.

Maven

Use the version currently under development.

<dependency>
	<groupId>org.nlp4j</groupId>
	<artifactId>nlp4j-core</artifactId>
	<version>1.1.1.0-SNAPSHOT</version>
</dependency>

Text Data

In the morphological analysis used by default (Yahoo! Japan Developer Network Japanese morphological analysis), the upper limit of the request size is 900KB, and the number of times is limited, so a small text file is used.

one

I am a cat.
There is no name yet.

I have no idea where I was born.
I remember only crying in a dim and damp place.
I saw human beings for the first time here.
Moreover, I heard later that it was the most evil race of human beings called Shosei.
This student is a story that sometimes catches us and boiled and eats.
However, I didn't think anything at that time, so I didn't think it was particularly scary.
It just felt fluffy when it was placed on his palm and lifted up.
It is probably the beginning of what is called a human being that he calmed down a little on his palm and saw the student's face.
The feeling that I thought was strange at this time still remains.
The face, which should be decorated with the first hair, is slippery and looks like a kettle.
After that, I met a cat a lot, but I have never met such a single wheel.
Not only that, the center of the face is too protruding.
Then I sometimes blow smoke from the hole.
It was so throaty that I was really weak.
It was around this time that I finally learned that this is a cigarette that humans drink.


Java Code

package nlp4j.nokku.chap4;

import java.util.List;

import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.index.DocumentIndex;
import nlp4j.index.SimpleDocumentIndex;
import nlp4j.yhoo_jp.YJpMaAnnotator;

public class Nokku31 {
	public static void main(String[] args) throws Exception {
		//Use the text file crawler provided by NLP4J
		Crawler crawler = new TextFileLineSeparatedCrawler();
		crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
		crawler.setProperty("encoding", "UTF-8");
		crawler.setProperty("target", "text");
		//Document crawl
		List<Document> docs = crawler.crawlDocuments();
		//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
		DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
		{
			// Yahoo!Annotator using Japan's morphological analysis API
			DocumentAnnotator annotator = new YJpMaAnnotator();
			pipeline.add(annotator);
		}
		//Execution of annotation processing
		pipeline.annotate(docs);
		//Use DocumentIndex to count keywords.
		SimpleDocumentIndex index = new SimpleDocumentIndex();
		//Add documentation
		index.addDocuments(docs);
		List<Keyword> kwds = index.getKeywords();
		kwds = kwds.stream() //
				.filter(o -> o.getFacet().equals("verb")) // 品詞がverb
				.collect(Collectors.toList());
		for (Keyword kwd : kwds) {
			System.err.println(kwd.getLex()); //← Change only here
		}
	}
}

result

Be born
Tsukuri
To do
cry
start
Say
to see
listen
Say
Say
capture
Boil
Eat
Say
think
Put
lift
To do
is there
Calm down
to see
Say
think
Remain
Offal
To do
Meet
meet
To do
Blow
Can throat
Kuu
Weak
to drink
Say
know

Summary

With NLP4J, you can easily process natural language in Java!

Project URL

https://www.nlp4j.org/ NLP4J_N_128.png


Return to Index

Recommended Posts

NLP4J [006-032] 100 language processing with NLP4J Knock # 32 Prototype of verb
NLP4J [006-031] 100 language processing knocks with NLP4J # 31 verb
NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J
NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B"
NLP4J [006-033] 100 language processing knocks with NLP4J # 33 Sahen noun
NLP4J [006-030] 100 language processing knocks with NLP4J # 30 Reading morphological analysis results
Introducing NLP4J-[000] Natural Language Processing Index in Java
NLP4J [006-034c] 100 language processing knocks with NLP4J # 34 Try to solve "A's B" smarter (final edition)
Christmas with Processing
Control the processing flow of Spring Batch with JavaConfig.