[PYTHON] Mecab installation notes

Install Mecab 0.994 on CentOS 6.4 and call it from Ruby and Python. The Ruby version is 2.1.2p95 and Python is 3.4.1.

Install Mecab Install Mecab itself.
$ wget http://mecab.googlecode.com/files/mecab-0.994.tar.gz
$ ls
$ cd mecab-0.994
$ sudo ./configure --enable-utf8-only
$ make
$ sudo make install
$ sudo ln -s /usr/local/bin/mecab-config /usr/bin/mecab-config
$ cd ~
$ sudo vi /etc/ld.so.conf
	/usr/local/lib ← added
$ sudo ldconfig

Dictionary registration Register the dictionary used by Mecab.
$ wget http://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
$ tar zvxf mecab-ipadic-2.7.0-20070801.tar.gz
$ cd mecab-ipadic-2.7.0-20070801
$ ./configure --with-charset=utf8
$ make
$ sudo make install
$ cd ~

$ wget "http://sourceforge.jp/frs/redir.php?m=jaist&f=%2Fnaist-jdic%2F53500%2Fmecab-naist-jdic-0.6.3b-20111013.tar.gz" -O naistdic.tar.gz
$ tar zvxf naistdic.tar.gz
$ cd mecab-naist-jdic-0.6.3b-20111013/
$ sudo ./configure --with-charset=utf8
$ make
$ sudo make install
$ cd ~

MeCab test Let's run Mecab.
$ mecab
The customer next door is a customer who often eats persimmons
Neighboring noun,General,*,*,*,*,next to,Tonari,Tonari
Particles,Attributive,*,*,*,*,of,No,No
Customer noun,General,*,*,*,*,Customer,Cuck,Cuck
Is a particle,Particle,*,*,*,*,Is,C,Wow
Often adverbs,General,*,*,*,*,Often,Yoku,Yoku
Persimmon noun,General,*,*,*,*,persimmon,Oyster,Oyster
Eating verb,Independence,*,*,Godan / Wa line reminder,Uninflected word,Eat,Ku,Ku
Customer noun,General,*,*,*,*,Customer,Cuck,Cuck
Auxiliary verb,*,*,*,Special,Uninflected word,Is,Da,Da
EOS

Ruby binding Allows you to call Mecab from Ruby.
$ wget http://mecab.googlecode.com/files/mecab-ruby-0.994.tar.gz
$ tar zvxf mecab-ruby-0.994.tar.gz
$ /opt/ruby/current/bin/ruby extconf.rb
$ make
$ sudo make install
$ sudo ldconfig

Test from Ruby A file for testing is prepared, so execute it as it is.
$ /opt/ruby/current/bin/ruby test.rb
$ cd ~

Python binding Make it callable from Python as well.
$ wget http://mecab.googlecode.com/files/mecab-python-0.994.tar.gz
$ tar zvxf mecab-python-0.994.tar.gz
$ cd ../mecab-python-0.994
$ sudo vi setup.py
	return cmd1(str).split() ←def cmd2(str):Change the contents(1 place)
	/usr/local/bin/mecab-config ←mecab-Change config(4 places)
$ sudo /opt/python/current/bin/python  setup.py  build
$ sudo /opt/python/current/bin/python setup.py install
$ sudo ldconfig

Test from Python A test file is prepared, but since an error will occur in Python 3 series, check it quickly in Python's interactive mode.
$ /opt/python/current/bin/python
Python 3.4.1 (default, Aug  7 2014, 15:45:41)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import MeCab
>>> test = MeCab.Tagger("-Ochasen")
>>> hoge = test.parse("Call Mecab from Python")
>>> print(hoge)
Python Python Python noun-Proper noun-Organization
From Kara to particles-Case particles-General
Mecab Mecab Mecab noun-General
Wo Wo particle-Case particles-General
Call Yobidashi Call verb-Independent five-stage, continuous form
Masu Masu Auxiliary verb Special / Masu Basic form
EOS

>>>

Finished.

Recommended Posts

Mecab installation notes
Theano installation notes
pyenv installation notes
Hydrogen installation notes
Python3.4 installation notes
Cabocha installation notes
Arch Linux installation notes
Installation notes for TensorFlow for Windows
Notes on using MeCab from Python
Homebrew and Pycharm installation instructions notes
JetBrains_Learning Notes_003
Django installation
Cuda installation
boto3 installation
SQLAlchemy notes
Pythia Installation
pyenv notes
Docker installation
Volatility installation
Python installation
pip installation
SQL notes
Sphinx installation
Pandas notes
Sphinx notes
django notes
Jupyter_Learning Notes_000
InstantOS 1 installation
Jupyter installation
Python installation
pip installation
Kivy installation
ChaSen installation
Backtrader installation
GPD P2 Max Ubuntu Mate installation notes
pip installation
Django notes
Morphological analysis tool installation (MeCab, Human ++, Janome, GiNZA)