I'll try.
Extract all the surface forms of the verb.
Maven
Use the version currently under development.
<dependency>
<groupId>org.nlp4j</groupId>
<artifactId>nlp4j-core</artifactId>
<version>1.1.1.0-SNAPSHOT</version>
</dependency>
Text Data
In the morphological analysis used by default (Yahoo! Japan Developer Network Japanese morphological analysis), the upper limit of the request size is 900KB, and the number of times is limited, so a small text file is used.
Java Code
package nlp4j.nokku.chap4;
import java.util.List;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.index.DocumentIndex;
import nlp4j.index.SimpleDocumentIndex;
import nlp4j.yhoo_jp.YJpMaAnnotator;
public class Nokku31 {
public static void main(String[] args) throws Exception {
//Use the text file crawler provided by NLP4J
Crawler crawler = new TextFileLineSeparatedCrawler();
crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
crawler.setProperty("encoding", "UTF-8");
crawler.setProperty("target", "text");
//Document crawl
List<Document> docs = crawler.crawlDocuments();
//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
{
// Yahoo!Annotator using Japan's morphological analysis API
DocumentAnnotator annotator = new YJpMaAnnotator();
pipeline.add(annotator);
}
//Execution of annotation processing
pipeline.annotate(docs);
//Use DocumentIndex to count keywords.
SimpleDocumentIndex index = new SimpleDocumentIndex();
//Add documentation
index.addDocuments(docs);
List<Keyword> kwds = index.getKeywords();
kwds = kwds.stream() //
.filter(o -> o.getFacet().equals("verb")) // 品詞がverb
.collect(Collectors.toList());
for (Keyword kwd : kwds) {
System.err.println(kwd.getStr());
}
}
}
Born
Tsuka
Shi
Crying
start
Say
You see
listen
Say
Say
Catch
Boiled
Eat
Say
Thoughts
Loading
Lift
Shi
Ah
Calm down
You see
Say
Thoughts
Remaining
Mot
Shi
Meet
Meet
Shi
Blow
Throat
Ku
Weak
to drink
Say
Know
With NLP4J, you can easily process natural language in Java!
https://www.nlp4j.org/
Recommended Posts