Install Mecab and mecab-python3 on Ubuntu 14.04

Overview

Mecab is an open source morphological analysis engine. It can be used to divide Japanese sentences as a preparation for machine learning. The goal of this article is to install Mecab and make it available from Python.

environment

Installation procedure

  1. Mecab

I referred to this article.

$ sudo apt-get install mecab libmecab-dev mecab-ipadic mecab-ipadic-utf8

(I'm not sure if I need both mecab-ipadic and mecab-ipadic-utf8, but it seems to work for now)

You can see the result of morphological analysis by executing the mecab command and inputting Japanese sentences. For example, the result of entering "Prime Minister Shinzo Abe" is as follows.

$ mecab
Prime Minister Shinzo Abe
Abe noun,Proper noun,Personal name,Surname,*,*,Abe,Abe,Abe
Jin noun,Proper noun,Personal name,Name,*,*,Jin,Susumu,Susumu
Three nouns,number,*,*,*,*,three,Sun,Sun
Prime Minister noun,General,*,*,*,*,Prime Minister,Shusho,Shusho
EOS

"Shinzo" has not been analyzed correctly.

  1. mecab-ipadic-NEologd

The default IPA dictionary seems to be vulnerable to proper noun parsing, so we have significantly enhanced proper nouns and other new words mecab-ipadic-NEologd. Enter the dictionary blob / master / README.ja.md).

$ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
$ cd mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -n -a

Edit / etc / mecabrc to specify this as the default dictionary

dicdir = /usr/lib/mecab/dic/mecab-ipadic-neologd

will do.

See the official documentation (https://github.com/neologd/mecab-ipadic-neologd/blob/master/README.ja.md) for more information.

Similarly, let's analyze "Prime Minister Shinzo Abe".

$ mecab -d                              
Prime Minister Shinzo Abe
Prime Minister Shinzo Abe noun,Proper noun,General,*,*,*,Shinzo Abe,Abe Shinzo Shusho,Abe Shinzosh Show
EOS

This time it is correctly recognized as a proper noun.

mecab-python3

Include Mecab bindings for Python 3.

$ pip install mecab-python3

This is OK.

mecab-test3


import sys
import MeCab
m = MeCab.Tagger("-Ochasen")
print(m.parse("Prime Minister Shinzo Abe delivered a policy speech at the Diet."))

When you run

$ python mecab-test.py 
Prime Minister Shinzo Abe Abe Shinzo Shusho Noun Shinzo Abe-Proper noun-General
Ha ha is a particle-Particle
,,, sign-Comma
Diet Kokkai Diet noun-General
De de de particle-Case particles-General
Policy Address Shisei Hoshin Enzetsu Policy Address Noun-Proper noun-General
Wo Wo particle-Case particles-General
Go Okonatsu Do verb-Independent five-stage / wa line prompting sound service continuous connection
Ta ta auxiliary verb special ta ta basic form
.. .. .. symbol-Kuten
EOS

It will be.

If you want to divide it

m = MeCab.Tagger("-Owakati")

You can do it.

mecab-wakati-test.py


import sys
import MeCab
m = MeCab.Tagger("-Owakati")
items = m.parse("Prime Minister Shinzo Abe delivered a policy speech at the Diet.")
print(items)
print(type(items))

When you run

$ python mecab-wakati-test.py          
Prime Minister Shinzo Abe delivered a policy speech at the Diet.

<class 'str'>

The result is returned as a string, so if you want to make it a list, you can do split ().

Recommended Posts

Install Mecab and mecab-python3 on Ubuntu 14.04
Install and run dropbox on Ubuntu 20.04
Install OpenCV and Chainer on Ubuntu
Install CUDA 8.0 and Chainer on Ubuntu 16.04
Install fabric on Ubuntu and try
Install Puppet Master and Client on Ubuntu 16.04
Install pyenv and Python 3.6.8 on Ubuntu 18.04 LTS
Install mecab on Marvericks
Install TensorFlow on Ubuntu
Install PySide2 on Ubuntu
Install JModelica on Ubuntu
Install mecab-python on CentOS
Install Python 3.3 on Ubuntu 12.04
Install Theano on Ubuntu 12.04
Install mecab on mac
Install angr on Ubuntu 18.04
Install mecab-python on Mac
Install pip / pip3 on Ubuntu
Install MongoDB on Ubuntu 16.04 and operate via python
Install GoLand IDE on Ubuntu
Install OpenCV on Ubuntu + python
wsl Install PostgreSQL on Ubuntu 18.04
[ROS] Install ROS (melodic) on Ubuntu (18.04)
Install Caffe on Ubuntu 14.04 (GPU)
Install Docker on WSL Ubuntu 18.04
Install Apache 2.4 on Ubuntu 19.10 Eoan Ermine and run CGI
Install CUDA10.1 + cuDNN7.6.5 + tensorflow-2.3.0 on Ubuntu 18.04
Install Python 3.8 on Ubuntu 18.04 (OS standard)
Install Caffe on Ubuntu 14.04 (CPU mode)
Install Python 3.8 on Ubuntu 20.04 (OS standard)
Build and install OpenCV on Windows
Install python3 and scientific calculation library on Ubuntu (virtualenv + pip)
Install Python 3.9 on Ubuntu 20.04 (OS standard?)
Install confluent-kafka for Python on Ubuntu
Install Python 2.7 on Ubuntu 20.04 (OS standard?)
How to install Go on Ubuntu
ROS study # 1 Install ros-noetic on ubuntu 20.04
Install easy_install and pip on windows
Install mecab on Sakura shared server and call it from python
Install Ubuntu 18.04 on MacBook Pro Touchbar model and connect to WIFI
Build Python3 and OpenCV environment on Ubuntu 18.04
Install wsl2 and master linux on windows
Python virtual environment and packages on Ubuntu
Steps to install Python environment on Ubuntu
Install and launch k3s on Manjaro Linux
Install and Configure TigerVNC server on Linux
Mount and format Disk on Ubuntu on GCP.
Install Pleasant on Ubuntu 20.04 (.NetCore3.1 / PostgreSQL version)
Install Mecab on Linux (CentOS) with brew
Install ubuntu on 32bit UEFI Ultra Notebook
Install Caffe running 3D-CNN on clean Ubuntu 14.04
How to install php7.4 on Linux (Ubuntu)
Install pyenv and rbenv on CentOS system-wide
Shebang on Ubuntu 20.04
Manage Django images and static assets on Ubuntu
Install matplotlib and display graph on Jupyter Notebook
I can't install Dask with pip on Ubuntu
Install and run Python3.5 + NumPy + SciPy on Windows 10
Install python package in personal environment on Ubuntu
[Procedure memo] Install Python3 + OpenSSL locally on Ubuntu
[Note] Install wxPython 3.x on Linux Mint (Ubuntu)