NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B"

Return to Index

I'll try.

34. "B of A"

Extract a noun phrase in which two nouns are connected by "no".

Maven

Use the version currently under development.

<dependency>
	<groupId>org.nlp4j</groupId>
	<artifactId>nlp4j-core</artifactId>
	<version>1.1.1.0-SNAPSHOT</version>
</dependency>

Text Data

In the morphological analysis used by default (Yahoo! Japan Developer Network Japanese morphological analysis), the upper limit of the request size is 900KB, and the number of times is limited, so a small text file is used.

one

I am a cat.
There is no name yet.

I have no idea where I was born.
I remember only crying in a dim and damp place.
I saw human beings for the first time here.
Moreover, I heard later that it was the most evil race of human beings called Shosei.
This student is a story that sometimes catches us and boiled and eats.
However, I didn't think anything at that time, so I didn't think it was particularly scary.
It just felt fluffy when it was placed on his palm and lifted up.
It is probably the beginning of what is called a human being that he calms down a little on his palm and sees the student's face.
The feeling that I thought was strange at this time still remains.
The face, which should be decorated with the first hair, is slippery and looks like a kettle.
After that, I met a cat a lot, but I have never met such a single wheel.
Not only that, the center of the face is too protruding.
Then I sometimes blow smoke from the hole.
It was so throaty that I was really weak.
It was around this time that I finally learned that this is a cigarette that humans drink.


Java Code

package nlp4j.nokku.chap4;

import java.util.List;

import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.index.DocumentIndex;
import nlp4j.index.SimpleDocumentIndex;
import nlp4j.yhoo_jp.YJpMaAnnotator;

public class Nokku31 {
	public static void main(String[] args) throws Exception {
		//Use the text file crawler provided by NLP4J
		Crawler crawler = new TextFileLineSeparatedCrawler();
		crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
		crawler.setProperty("encoding", "UTF-8");
		crawler.setProperty("target", "text");
		//Document crawl
		List<Document> docs = crawler.crawlDocuments();
		//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
		DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
		{
			// Yahoo!Annotator using Japan's morphological analysis API
			DocumentAnnotator annotator = new YJpMaAnnotator();
			pipeline.add(annotator);
		}
		//Execution of annotation processing
		pipeline.annotate(docs);
		//Use DocumentIndex to count keywords.
		SimpleDocumentIndex index = new SimpleDocumentIndex();
		//Add documentation
		index.addDocuments(docs);
		List<Keyword> kwds = index.getKeywordsWithoutCount();

		//Find "A to B"
		String meishi_a = null;
		String no = null;

		for (Keyword kwd : kwds) {
			if (meishi_a == null && kwd.getFacet().equals("noun")) {
				meishi_a = kwd.getLex();
			} //
			else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
				no = kwd.getLex();
			} //
			else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
				System.err.println(meishi_a + no + kwd.getLex());
				meishi_a = null;
				no = null;
			} //
			else {
				meishi_a = null;
				no = null;
			}
		}
	}
}

result

His palm
On the palm
Student's face
Should face
In the middle of the face
In the hole

Continued

This article continues. NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J

Summary

With NLP4J, you can easily process natural language in Java!

Project URL

https://www.nlp4j.org/ NLP4J_N_128.png


Return to Index

Recommended Posts

NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B"
NLP4J [006-031] 100 language processing knocks with NLP4J # 31 verb
NLP4J [006-033] 100 language processing knocks with NLP4J # 33 Sahen noun
NLP4J [006-030] 100 language processing knocks with NLP4J # 30 Reading morphological analysis results
NLP4J [006-034c] 100 language processing knocks with NLP4J # 34 Try to solve "A's B" smarter (final edition)
NLP4J [006-032] 100 language processing with NLP4J Knock # 32 Prototype of verb
NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J
Let's make a Christmas card with Processing!
AtCoder Beginner Contest 169 A, B, C with ruby
Introducing NLP4J-[000] Natural Language Processing Index in Java
Christmas with Processing
I tried OCR processing a PDF file with Java
ABC --013-A & B & C
ABC --023 --A & B & C
ABC --036-A & B & C
ABC --010 --A & B & C
ABC --028 --A & B & C
ABC --128 --A & B & C
ABC --012-A & B & C
ABC --018 --A & B & C
ABC --054 --A & B & C
ABC --017 --A & B & C
ABC --029- A & B & C
ABC --022 --A & B & C
ABC --019 --A & B & C
ABC --020 --A & B & C
ABC --030- A & B & C
ABC --127 --A & B & C
ABC --132- A & B & C
ABC --026 --A & B & C
ABC --014- A & B & C
ABC --016 --A & B & C
ABC --011-A & B & C
ABC --031 --A & B & C
ABC --025 --A & B & C
ABC --024 --A & B & C
ABC --027 --A & B & C
ABC --080- A & B & C
Try debugging natural language processing on Windows. with VS Code
I tried OCR processing a PDF file with Java part2