[Python] Morphological analysis with MeCab

Since morphological analysis may be used for preprocessing of data used for NLP, it is summarized.

What is MeCab?

An open source Japanese morphological analysis engine.

Developed by Taku Kudo, a current Google software engineer and one of the Google Japanese Input developers. The name was taken from the developer's favorite "Wakame turnip".

Quoted from Wikipedia

environment

Installation

Install MeCab itself.

$ brew install mecab

Install MeCab dictionary.

$ brew install mecab-ipadic

Check if MeCab is installed.

$ mecab --version
mecab of 0.996

Let's try morphological analysis.

$ mecab
Let's try morphological analysis.
Trial noun,General,*,*,*,*,trial,Tamesh,Tamesh
Particles,Case particles,General,*,*,*,To,D,D
Morpheme noun,General,*,*,*,*,morpheme,Keitaiso,Keitaiso
Parsing noun,Change connection,*,*,*,*,analysis,Kaiseki,Kaiseki
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
See verb,Non-independent,*,*,One step,Uninflected word,View,mill,mill
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

The word ** morphological analysis ** has been broken down into ** morpheme ** and ** parsing **. To solve this, install the latest dictionary ** mecab-ipadic-NEologd **. First, clone the dictionary data from GitHub.

$ git clone --depth 1 [email protected]:neologd/mecab-ipadic-neologd.git

Go to the cloned repository, run install and select yes on the confirmation screen.

$ cd mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -n
yes

Specify the dictionary with the -d option and try morphological analysis again.

$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd/
Let's try morphological analysis.
Try adverbs,General,*,*,*,*,As a test,Tameshini,Tameshini
Morphological analysis noun,Proper noun,General,*,*,*,Morphological analysis,Iseki Soca,Iseki Soca
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
See verb,Non-independent,*,*,One step,Uninflected word,View,mill,mill
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

Safely, it became one word ** morphological analysis **.

Used in Python

Install the library for python.

pip3 install mecab-python3

After that, write the code and try it.

import MeCab

mecab = MeCab.Tagger ('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
print(mecab.parse('Let's try morphological analysis.'))


Try adverbs,General,*,*,*,*,As a test,Tameshini,Tameshini
Morphological analysis noun,Proper noun,General,*,*,*,Morphological analysis,Iseki Soca,Iseki Soca
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
See verb,Non-independent,*,*,One step,Uninflected word,View,mill,mill
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

Recommended Posts

[Python] Morphological analysis with MeCab
Japanese morphological analysis with Python
Text mining with Python ① Morphological analysis
I played with Mecab (morphological analysis)!
Collecting information from Twitter with Python (morphological analysis with MeCab)
Data analysis with python 2
Use mecab with Python3
Voice analysis with python
Tweet analysis with Python, Mecab and CaboCha
Voice analysis with python
Python: Simplified morphological analysis with regular expressions
Data analysis with Python
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Python: Japanese text: Morphological analysis
Sentiment analysis with Python (word2vec)
Planar skeleton analysis with Python
Muscle jerk analysis with Python
[PowerShell] Morphological analysis with SudachiPy
Text mining with Python ① Morphological analysis (re: Linux version)
3D skeleton structure analysis with Python
Impedance analysis (EIS) with python [impedance.py]
■ [Google Colaboratory] Use morphological analysis (MeCab)
Data analysis starting with python (data visualization 1)
Logistic regression analysis Self-made with python
When using MeCab with virtualenv python
Data analysis starting with python (data visualization 2)
Morphological analysis using Igo + mecab-ipadic-neologd in Python (with Ruby bonus)
FizzBuzz with Python3
Scraping with Python
Statistics with python
[In-Database Python Analysis Tutorial with SQL Server 2017]
Marketing analysis with Python ① Customer analysis (decyl analysis, RFM analysis)
Two-dimensional saturated-unsaturated osmotic flow analysis with Python
Scraping with Python
Python with Go
Data analysis python
Using Python and MeCab with Azure Databricks
Machine learning with python (2) Simple regression analysis
Make the morphological analysis engine MeCab available in Python 3 (March 2016 version)
2D FEM stress analysis program with Python
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
MeCab from Python
Principal component analysis with Power BI + Python
Bingo with python
Zundokokiyoshi with python
Data analysis starting with python (data preprocessing-machine learning)
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Two-dimensional unsteady heat conduction analysis with Python
Use Python and MeCab with Azure Functions
Excel with Python
Microcomputer with Python
Cast with python
From preparation for morphological analysis with python using polyglot to part-of-speech tagging
[Let's play with Python] Aiming for automatic sentence generation ~ Perform morphological analysis ~
[Various image analysis with plotly] Dynamic visualization with plotly [python, image]