Use Python's MeCab binding natto-py

What is natto-py?

natto-py is a Python package that provides binding with Python-MeCab's Foreign Function Interface (http://en.wikipedia.org/wiki/Foreign_function_interface) (FFI). It supports Python 2 and 3 and has the advantage that the compiler does not need it. * Available on nix, OS X and Windows.

Corresponding Python version

natto-py can use Python 2 and 3 below. The following versions have been proven.

Install MeCab

First, install MeCab 0.996.

Install natto-py

Install natto-py via pip as you would a regular Python package.

$ pip install natto-py

The cffi package is also required, but the above command will automatically install cffi if needed.

Let's use it for the time being

import statement

Import the MeCab class from natto to get an instance.

from natto import MeCab

nm = MeCab()

print(nm)

<natto.mecab.MeCab model=<cdata 'mecab_model_t *' 0x802016640>,
    tagger=<cdata 'mecab_t *' 0x8020a44c0>, 
    lattice=<cdata 'mecab_lattice_t *' 0x802079600>, 
    libpath="/opt/mecab/lib/libmecab.so", 
    options={}, 
    dicts=[<natto.dictionary.DictionaryInfo 
        dictionary=<cdata 'mecab_dictionary_info_t *' 0x802079480>,
        filepath="/opt/mecab/lib/mecab/dic/ipadic/sys.dic", 
        charset=utf-8, 
        type=0>], 
 version=0.996>

Analysis to standard output

The sentence is parsed for the time being and the result is sent to standard output as a character string.

text = "A hero always appears in a pinch."

print(nm.parse(text))

Pinch noun,General,*,*,*,*,pinch,pinch,pinch
Particles,Attributive,*,*,*,*,of,No,No
Time noun,Non-independent,Adverbs possible,*,*,*,Time,Toki,Toki
Particles,Case particles,General,*,*,*,To,D,D
Is a particle,Particle,*,*,*,*,Is,C,Wow
Be sure to adverb,Particle connection,*,*,*,*,you have to,Canaras,Canaras
Hero noun,General,*,*,*,*,Hero,Hero,Hero
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Verbs that appear,Independence,*,*,One step,Uninflected word,appear,Allawarel,Allawarel
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

MeCabNode output

Get the analysis result with MeCabNode and output more detailed information about each morpheme.

# -F / --node-Specify the output format of the node with the format option
#
# %m    ...Morpheme surface sentence
# %f[0] ...Part of speech
# %h    ...Part of speech ID(IPADIC)
# %f[8] ...pronunciation
#  
with MeCab('-F%m,%f[0],%h,%f[8]') as nm:
    for n in nm.parse(text, as_nodes=True):
        print(n.feature)

pinch,noun,38,pinch
of,Particle,24,No
Time,noun,66,Toki
To,Particle,13,D
Is,Particle,16,Wow
you have to,adverb,35,Canaras
Hero,noun,38,Hero
But,Particle,13,Moth
appear,verb,31,Allawarel
。,symbol,7,。
EOS

If you use Python with statement, MeCab live even if the context ends normally or an exception occurs. It is recommended as the rally reference is automatically destroyed.

that's all

reference

Recommended Posts

Use Python's MeCab binding natto-py
Use mecab with Python3
Use MeCab constrained parsing (partial parsing) in Python through natto-py
Let's use python's wordcloud easily!
Use MeCab to fetch readings
How to use Python's logging module
Make full use of Python's str.format
■ [Google Colaboratory] Use morphological analysis (MeCab)
How to use Python's Context Manager