[PYTHON] Perform morphological analysis in the machine learning environment launched by GCE

Thing you want to do

In the machine learning environment (Ubuntu 16.04 LTS) launched by GCE, I first installed morphological analysis software in order to perform natural language processing. However, it took a lot of time to install it, so I will leave it as a memorandum.

Installed software libraries

Can be installed only with pip install, janome is omitted

Install Mecab

Install Mecab and dictionary (UTF-8 version)

sudo apt-get install mecab mecab-ipadic-utf8

If you don't include these, mecab-python will not install properly

sudo apt-get install libmecab-dev sudo apt-get install build-essential

Finally, install the library to call Mecab from pthon3.x

pip install mecab-python3

Install JUMAN ++

 I have some necessary packages and can't install them properly, JUMAN ++
 I heard that the ability of morphological analysis is more than Mecab, so I researched various things that I would definitely like to install, and it worked with the following procedure

To use JUMAN ++ first

Install the required packages It takes quite a while

sudo apt install checkinstall auto-apt ccache sudo auto-apt update sudo apt install google-perftools libgoogle-perftools-dev libboost-dev

Download and unzip JUMAN ++

wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/jumanpp/jumanpp-1.01.tar.xz tar xJvf jumanpp-1.01.tar.xz

Then install JUMAN ++

auto-apt run ./configure CC="ccache gcc" CFLAGS="-O3" CXX="ccache g++" CXXFLAGS="-O3" make sudo checkinstall

Now, when the version comes out as follows, the installation of JUMAN ++ is completed successfully.

jumanpp -v

JUMAN++ 1.01

To use JUMAN ++ from Python

 Installation continues to use JUMAN ++ with Python

Install in the order of JUMAN → KNP → PyKNP, referring to Using JUMAN ++ from Python.

However, isn't it registered in the Python library just for the above? It looks like, so finally execute the following to complete

pip install ./pyknp-0.3

Try morphological analysis

 Try to implement "Right of Foreigners to Vote" in Mecab, JUMAN ++, Janome

For Mecab

import MeCab
mecab = MeCab.Tagger("-Ochasen")
print(mecab.parse("Foreigners to vote"))
Foreign Gaikoku Foreign nouns-General
Carrot carrot carrot noun-General
Administration Seiken Administration Noun-General
EOS

For JUMAN ++

from pyknp import Jumanpp
jumanpp = Jumanpp()
r=jumanpp.analysis("Foreigners to vote")
for m in r.mrph_list():
    print(m.midasi)
Foreign countries
Man
Suffrage
Right

For Janome

from janome.tokenizer import Tokenizer
t = Tokenizer()
tokens = t.tokenize('Foreigners to vote')
for token in tokens:
    print(token)
Foreign noun,General,*,*,*,*,Foreign countries,Gaikoku,Gaikoku
Carrot noun,General,*,*,*,*,carrot,carrot,carrot
Regime noun,General,*,*,*,*,administration,Seiken,Seiken

After all, JUMAN ++ is good.

Referenced site

Text mining with Python ① Morphological analysis (re: Linux version)

[How to install JUMAN ++ on Ubuntu 16.04 LTS] (http://qiita.com/SUZUKI_Masaya/items/29c81d037cdf7d37b900)

[How to install software on Ubuntu using auto-apt, checkinstall, ccache] (http://qiita.com/SUZUKI_Masaya/items/bd03f39e20a1a8f7f4f6#%E5%BF%85%E8%A6%81%E3%81%AA%E3%83%91%E3%83%83%E3%82%B1%E3%83%BC%E3%82%B8%E3%81%AE%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB)

Use JUMAN ++ from Python

Recommended Posts

Perform morphological analysis in the machine learning environment launched by GCE
Preprocessing in machine learning 1 Data analysis process
Difference in morphological analysis results by mecab dictionary
Analysis of shared space usage by machine learning
A story about data analysis by machine learning
I tried to predict the change in snowfall for 2 years by machine learning
Build an interactive environment for machine learning in Python
Try to make a blackjack strategy by reinforcement learning (② Register the environment in gym)
About testing in the implementation of machine learning models
[Machine learning] Write the k-nearest neighbor method (k-nearest neighbor method) in python by yourself and recognize handwritten numbers.
The result of Java engineers learning machine learning in Python www
Survey on the use of machine learning in real services
Predict the presence or absence of infidelity by machine learning
Launching a machine learning environment using Google Compute Engine (GCE)
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Visualize the correlation matrix by principal component analysis in Python
Machine learning in Delemas (practice)
Machine learning environment construction macbook 2021
Build a machine learning environment
Used in machine learning EDA
4 [/] Four Arithmetic by Machine Learning
How about Anaconda for building a machine learning environment in Python?
[Understanding in the figure] Management of Python virtual environment by Pipenv
Feature extraction by TF method using the result of morphological analysis
Morphological analysis of sentences containing recent words in Windows10 64bit environment
Learn machine learning anytime, anywhere in an on-demand Jupyter Notebook environment
People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning
Prepare a high-speed analysis environment by hitting mysql from the data analysis environment
Run Polyglot on Raspberry Pi to perform morphological analysis in English
Machine learning summary by Python beginners
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
Classification and regression in machine learning
Machine learning in Delemas (data acquisition)
Python: Preprocessing in Machine Learning: Overview
Preprocessing in machine learning 2 Data acquisition
Random seed research in machine learning
Machine Learning: Supervised --Linear Discriminant Analysis
Preprocessing in machine learning 4 Data conversion
One-click data prediction for the field realized by fully automatic machine learning
I tried fractal dimension analysis by the box count method in 3D
Python learning memo for machine learning by Chainer until the end of Chapter 2
Judge the authenticity of posted articles by machine learning (Google Prediction API).