Text mining with Python ① Morphological analysis (re: Linux version)

Challenge text mining with Python. (For Python3 series) Follow the steps below.

① Morphological analysis (this article) ② Visualization with Word Cloud (next time)

Last time, I tried to use MeCab on Windows and stumbled on installing Python bindings and gave up, so I switched to Linux and restarted.



Install MeCab

(review) To be able to use MeCab in Python ・ Installation of MeCab main unit ・ Installation of dictionary -Install Python bindings Is necessary.

The Windows version came with a dictionary in MeCab itself, but the Linux version needs to be installed separately. However, you can install it together with the package.

Install MeCab and dictionary

Just install with apt. For the dictionary, select the UTF-8 version of IPA (recommended).

sudo apt-get install mecab mecab-ipadic-utf8

As usual, check the operation with "Sumomomo Momomo".

$ mecab
Of the thighs and thighs
Plum noun,General,*,*,*,*,Plum,Plum,Plum
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Particles,Attributive,*,*,*,*,of,No,No
Noun,Non-independent,Adverbs possible,*,*,*,home,Uchi,Uchi
EOS

Install MeCab Python bindings

Just install this with apt.

sudo apt-get install python-mecab

Let's analyze "Plum ..." from Python.

mecab_sample.py


# coding: utf-8
import sys
import MeCab

mecab = MeCab.Tagger("-Ochasen")

print(mecab.parse("Of the thighs and thighs"))
$ python3 mecab_sample.py
Traceback (most recent call last):
  File "mecab_sample.py", line 3, in <module>
    import MeCab
ImportError: No module named 'MeCab'

It is said that there is no MeCab ... Try running it with python 2.x.

$ python mecab_sample.py
Plum Sumomo Noun-General
Momo particle-Particle
Peach peach noun-General
Momo particle-Particle
Peach peach noun-General
Nono particle-Attributive
Uchi Uchi Noun-Non-independent-Adverbs possible
EOS

This one works fine. Looking at it, it seems that what I put in with apt only works with Python 2.x series. It seems that it is necessary to bring the source and build it with setup.py as it was done in the Windows version to use it in Python 3 series, but it is also premised on Python 2 series and a patch is required to run it in Python 3 series It seems that you need to hit it, so it seems that it is not straightforward.

Uh, it's a hassle ... I found a article that says it's OK to put a library for Python3 with pip, so I'll try it.

$ pip3 install mecab-python3
Collecting mecab-python3
  Using cached mecab-python3-0.7.tar.gz
    Complete output from command python setup.py egg_info:
    /bin/sh: 1: mecab-config: not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-gsw8fi5f/mecab-python3/setup.py", line 41, in <module>
        include_dirs=cmd2("mecab-config --inc-dir"),
      File "/tmp/pip-build-gsw8fi5f/mecab-python3/setup.py", line 21, in cmd2
        return cmd1(strings).split()
      File "/tmp/pip-build-gsw8fi5f/mecab-python3/setup.py", line 18, in cmd1
        return os.popen(strings).readlines()[0][:-1]
    IndexError: list index out of range
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-gsw8fi5f/mecab-python3/

I get an error because there is no mecab-config like in Windows. I didn't specify libmecab-dev because I didn't need it when I first installed MeCab, so it seems that it is not included. Enter with apt.

sudo apt-get install libmecab-dev

Then, use pip to insert the binding for Python3 series.

sudo pip3 install mecab-python3

Then run the sample in Python3.

$ python3 mecab_sample.py 
Plum Sumomo Noun-General
Momo particle-Particle
Peach peach noun-General
Momo particle-Particle
Peach peach noun-General
Nono particle-Attributive
Uchi Uchi Noun-Non-independent-Adverbs possible
EOS

I was finally able to do it.

Referenced site

-Morphological analysis engine MeCab can be used with Python3 (March 2016 version) -[\ [Python ] \ Mecab ] How to install mecab in ubuntu environment -How to use MeCab on Ubuntu 14.04 and Python 3

Recommended Posts

Text mining with Python ① Morphological analysis (re: Linux version)
Text mining with Python ① Morphological analysis
[Python] Morphological analysis with MeCab
Python: Japanese text: Morphological analysis
Japanese morphological analysis with Python
Text mining with Python ② Visualization with Word Cloud
Python: Simplified morphological analysis with regular expressions
Data analysis with python 2
Voice analysis with python
Text mining with Python-Scraping-
Voice analysis with python
Data analysis with Python
Check version with python
Collecting information from Twitter with Python (morphological analysis with MeCab)
Challenge principal component analysis of text data with Python
Pure Python version online morphological analysis tool Rakuten MA
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Sentiment analysis with Python (word2vec)
Planar skeleton analysis with Python
Specify python version with virtualenv
Muscle jerk analysis with Python
[PowerShell] Morphological analysis with SudachiPy
Text sentiment analysis with ML-Ask
Morphological analysis using Igo + mecab-ipadic-neologd in Python (with Ruby bonus)
3D skeleton structure analysis with Python
GOTO in Python with Sublime Text 3
Impedance analysis (EIS) with python [impedance.py]
Text extraction with AWS Textract (Python3.6)
Make the morphological analysis engine MeCab available in Python 3 (March 2016 version)
Enable Python raw_input with Sublime Text 3
Python: Negative / Positive Analysis: Text Analysis Application
Speak Japanese text with OpenJTalk + python
Manage each Python version with Homebrew
I played with Mecab (morphological analysis)!
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Data analysis starting with python (data visualization 1)
Logistic regression analysis Self-made with python
Data analysis starting with python (data visualization 2)
[Python Windows] pip install with Python version
From preparation for morphological analysis with python using polyglot to part-of-speech tagging
[Let's play with Python] Aiming for automatic sentence generation ~ Perform morphological analysis ~
English speech recognition with python [speech to text]
Building a Python3 environment with Amazon Linux2
[In-Database Python Analysis Tutorial with SQL Server 2017]
Marketing analysis with Python ① Customer analysis (decyl analysis, RFM analysis)
Two-dimensional saturated-unsaturated osmotic flow analysis with Python
Machine learning with python (2) Simple regression analysis
2D FEM stress analysis program with Python
[C] [python] Read with AquesTalk on Linux
Tweet analysis with Python, Mecab and CaboCha
Principal component analysis with Power BI + Python
Data analysis starting with python (data preprocessing-machine learning)
Two-dimensional unsteady heat conduction analysis with Python
Try text mining your diary in Python
Read text in images with python OCR
Extract text from PowerPoint with Python! (Compatible with tables)
Text extraction with GCP Cloud Vision API (Python3.6)
[Various image analysis with plotly] Dynamic visualization with plotly [python, image]
Classify Qiita posts without morphological analysis with Tweet2Vec
Let's write FizzBuzz with an error: Python Version
Medical image analysis with Python 1 (Read MRI image with SimpleITK)