I checked how to use Janome, so I made a note.
Janome is a dictionary comprehension morphological analyzer written in Pure Python. We aim to be a morphological analysis library with a simple API that can be easily installed without dependent libraries and easily incorporated into applications.
I feel like I'm going to try it out, so I decided to use Janome, which seems to be the easiest to use in Python. Compared to Mecab, it's easier to use with just pip install. For other Japanese morphological analysis tools, see the summary of here.
Excerpt from the official website.
from janome.tokenizer import Tokenizer
t = Tokenizer()
for token in t.tokenize(u'Of the thighs and thighs'):
print(token)
When the result of Tokenizer.tokenize is output by print, it looks like this.
Is verb, non-independent, \ *, \ *, one-step, uninflected, is, il, il
According to here, from the left, "original word", "part of speech", "part of speech subclassification 1", "classification 2", "classification" 3 ”,“ inflected form ”,“ inflected form ”,“ original form ”,“ reading ”,“ pronunciation ”.
The result of tokenize has the following string properties.
--surface: original word --part_of_speech: [Part of speech], [Part of speech subclassification 1], [Category 2], [Category 3] --infl_type: Inflected form --infl_form: Utilization type --base_form: Prototype --reading: reading --phonetic: Pronunciation.
Recommended Posts