I'll try.
Extract all the original forms of the verb.
Maven
Use the version currently under development.
<dependency>
<groupId>org.nlp4j</groupId>
<artifactId>nlp4j-core</artifactId>
<version>1.1.1.0-SNAPSHOT</version>
</dependency>
Text Data
In the morphological analysis used by default (Yahoo! Japan Developer Network Japanese morphological analysis), the upper limit of the request size is 900KB, and the number of times is limited, so a small text file is used.
one
I am a cat.
There is no name yet.
I have no idea where I was born.
I remember only crying in a dim and damp place.
I saw human beings for the first time here.
Moreover, I heard later that it was the most evil race of human beings called Shosei.
This student is a story that sometimes catches us and boiled and eats.
However, I didn't think anything at that time, so I didn't think it was particularly scary.
It just felt fluffy when it was placed on his palm and lifted up.
It is probably the beginning of what is called a human being that he calmed down a little on his palm and saw the student's face.
The feeling that I thought was strange at this time still remains.
The face, which should be decorated with the first hair, is slippery and looks like a kettle.
After that, I met a cat a lot, but I have never met such a single wheel.
Not only that, the center of the face is too protruding.
Then I sometimes blow smoke from the hole.
It was so throaty that I was really weak.
It was around this time that I finally learned that this is a cigarette that humans drink.
Java Code
package nlp4j.nokku.chap4;
import java.util.List;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.index.DocumentIndex;
import nlp4j.index.SimpleDocumentIndex;
import nlp4j.yhoo_jp.YJpMaAnnotator;
public class Nokku31 {
public static void main(String[] args) throws Exception {
//Use the text file crawler provided by NLP4J
Crawler crawler = new TextFileLineSeparatedCrawler();
crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
crawler.setProperty("encoding", "UTF-8");
crawler.setProperty("target", "text");
//Document crawl
List<Document> docs = crawler.crawlDocuments();
//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
{
// Yahoo!Annotator using Japan's morphological analysis API
DocumentAnnotator annotator = new YJpMaAnnotator();
pipeline.add(annotator);
}
//Execution of annotation processing
pipeline.annotate(docs);
//Use DocumentIndex to count keywords.
SimpleDocumentIndex index = new SimpleDocumentIndex();
//Add documentation
index.addDocuments(docs);
List<Keyword> kwds = index.getKeywords();
kwds = kwds.stream() //
.filter(o -> o.getFacet().equals("verb")) // 品詞がverb
.collect(Collectors.toList());
for (Keyword kwd : kwds) {
System.err.println(kwd.getLex()); //← Change only here
}
}
}
Be born
Tsukuri
To do
cry
start
Say
to see
listen
Say
Say
capture
Boil
Eat
Say
think
Put
lift
To do
is there
Calm down
to see
Say
think
Remain
Offal
To do
Meet
meet
To do
Blow
Can throat
Kuu
Weak
to drink
Say
know
With NLP4J, you can easily process natural language in Java!
https://www.nlp4j.org/
Recommended Posts