Use MeCab constrained parsing (partial parsing) in Python through natto-py

Constraint analysis (partial analysis)

MeCab's constrained analysis (partial analysis) function is used when some morpheme information of a sentence is known or boundaries are known. The Python and MeCab binding natto-py provides three constrained parsing methods.

  1. --partial / -p option
  2. Specifying boundary constraints
  3. Specifying feature constraints

Partial analysis with --partial option

Specify the --partial or -p option when retrieving a MeCab instance. The input statement passed to parse describes the constraint in the following format.

from natto import MeCab

text = """garden\t Hoge
To
Haniwa\t Hoge
Chicken\t Hoge
There is.
"""

with MeCab("--partial") as nm:
    print(nm.parse(text))
 
Niwahoge
Particles,Case particles,General,*,*,*,To,D,D
Haniwa Hoge
Chicken Hoge
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Verb,Independence,*,*,One step,Uninflected word,Is,Il,Il
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
    

The above example sends the analysis result to the standard output, but for finer constraints, use the morpheme boundary constraint (boundary) or part of speech constraint (feature) function.

Morpheme boundary constraints

If you know the word boundaries in advance, you can specify the boundaries as a compiled regular expression or string with the boundary_constraints keyword argument. Those that match the specified morpheme boundary will be treated as one morpheme and analyzed.

text = "There is a chicken in the haniwa."

patt = "Chicken|Haniwa|garden"

with MeCab() as nm:
    #Get information for each MeCabNode by specifying a morpheme boundary constraint
    for n in nm.parse(text, boundary_constraints=patt, as_nodes=True):
        if not (n.is_bos() or n.is_eos()):
            print("{}:\t{}". format(n.surface, n.feature))

# BOS/Omit EOS node
garden:noun,General,*,*,*,*,*
To:Particle,Case particles,General,*,*,*,To,D,D
Haniwa:noun,General,*,*,*,*,Haniwa,Haniwa,Haniwa
Chicken:noun,General,*,*,*,*,Chicken,Chicken,Chicken
But:Particle,Case particles,General,*,*,*,But,Moth,Moth
Is:verb,Independence,*,*,One step,Uninflected word,Is,Il,Il
。:symbol,Kuten,*,*,*,*,。,。,。  

For details, see 6.2. Re — Regular Expression Operation and re.finditer See /re.html#re.finditer).

Feature constraints

The feature_constraints keyword argument allows you to specify part of speech classification for each particular morpheme. Tuple (tuple) that has part words for morphological elements as a pair, and those morphological elements and part word mappings are further stored in tuples. Then pass it to the parse method as follows:

feat = (("Chicken","Hoge"), ("Haniwa","HogeHoge"), ("garden","更にHoge"))

with MeCab() as nm:
    #Get information for each MeCabNode by specifying part-speech constraints for some morphemes
    for n in nm.parse(text, feature_constraints=feat, as_nodes=True):
        if not (n.is_bos() or n.is_eos()):
            print("{}:\t{}". format(n.surface, n.feature))

# BOS/Omit EOS node
garden:Further loosening
To:Particle,Case particles,General,*,*,*,To,D,D
Haniwa:Hogehoge
Chicken:Hoge
But:Particle,Case particles,General,*,*,*,But,Moth,Moth
Is:verb,Independence,*,*,One step,Uninflected word,Is,Il,Il
。:symbol,Kuten,*,*,*,*,。,。,。

that's all

reference

Recommended Posts

Use MeCab constrained parsing (partial parsing) in Python through natto-py
Use config.ini in Python
Use Valgrind in Python
Use mecab with Python3
Use profiler in Python
Let's use def in python
Use let expression in Python
Use Measurement Protocol in Python
Use callback function in Python
Use parameter store in Python
Use HTTP cache in Python
Use MongoDB ODM in Python
Use list-keyed dict in Python
Use Random Forest in Python
Use regular expressions in Python
Use Spyder in Python IDE
Use Python's MeCab binding natto-py
Put MeCab in "Windows 10; Python3.5 (64bit)"
CSS parsing with cssutils in Python
How to use SQLite in Python
Windows10: Install MeCab library in python
Use rospy with virtualenv in Python3
Parsing Subversion commit logs in Python
How to use Mysql in python
Use Python in pyenv with NeoVim
How to use ChemSpider in Python
How to use PubChem in Python
Parsing Git commit logs in Python
Use OpenCV with Python 3 in Window
A memorandum because I stumbled on trying to use MeCab in Python
[Introduction to Python] How to use class in Python?
Use print in a Python2 lambda expression
Python3> in keyword> True with partial match?
Easily use your own functions in Python
Easy way to use Wikipedia in Python
Don't use \ d in Python 3 regular expressions!
How to use __slots__ in Python class
Use pathlib in Maya (Python 2.7) for upcoming Python 3.7
How to use regular expressions in Python
How to use is and == in Python
Use Python and MeCab with Azure Functions
How to use the C library in Python
How to use Python Image Library in python3 series
3 steps to put Python + mecab in yum only
Summary of how to use MNIST in Python
Use cryptography module to handle OpenSSL in Python
Use Python in your environment from Win Automation
Use various rabbimq features with pika in python
Python lexical / parsing library (2014.11 first survey, 2019.10 partial addition)
Don't use readlines () in your Python for statement!
How to use tkinter with python in pyenv
Use jupyter-lab installed in python virtual environment (venv)
Use Python in Anaconda environment with VS Code
Use a custom error page in python / tornado
Use python in Docker container as Pycharm interpreter
[Question] What happens when I use% in python?
Use the LibreOffice app in Python (3) Add library
Use pydantic when reading environment variables in Python
How to plot autocorrelation and partial autocorrelation in python
Use Search Tweets: Full Archive / Sandbox in Python
Use os.getenv to get environment variables in Python