Ciao ... †
With the help of @ragion, we have finally released NEologd's preprocessing module neologd-java in Java!
https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja
There is a limit to making dictionary data redundant and absorbing different notations. When generating dictionary data, all the normalization processing described below is applied, so if the following normalization processing is applied to the text to be analyzed, it will be easier to match the words in the dictionary.
As you can see on the above page, it is important to perform normalization (preprocessing) before parsing with MeCab. Therefore, I created neologd-java, a preprocessing module for NEologd in Java.
Since it is registered in Maven Central, add the following to pom.xml
.
<dependency>
<groupId>io.github.ikegami-yukino</groupId>
<artifactId>neologdn</artifactId>
<version>0.0.1</version>
</dependency>
And
package yukinoi.neologdn_example;
import io.github.ikegamiyukino.neologdn.NeologdNormalizer;
/**
* neologdn-example
*
*/
public class App
{
public static void main(String[] args)
{
NeologdNormalizer normalizer = new NeologdNormalizer();
String text = "PRML supplementary reading book";
String normalizedText = normalizer.normalize(text);
System.out.println(normalizedText);
}
}
Use like.
I am developing with the following GitHub repository. https://github.com/ikegami-yukino/neologdn-java
Contributions are welcome!
Recommended Posts