From Python to using MeCab (and CaboCha)

environment

Mac OS 10.9.4 Python 2.7

Install Cabocha

MeCab is required to use CaboCha

Install CRF ++

The latest version at the time of writing is 0.58 http://crfpp.googlecode.com/svn/trunk/doc/index.html#download

Unzip

$ cd CRF++-0.58
$ ./configure
$ make
$ make install

$ cd python
$ sudo python setup.py install

Install MeCab

The latest version at the time of writing is 0.996 https://code.google.com/p/mecab/

From Downloads

--mecab-0.996.tar.gz (mecab body)

Download and unzip.

$ cd mecab-0.996
$ ./configure
$ make
$ sudo make install

$ cd ..
$ cd mecab-python-0.996
$ sudo python setup.py install

$ cd ..
$ cd mecab-ipadic-2.7.0-20070801
$ ./configure
$ make
$ sudo make install

There is no such file or directory


 If you are told

#### **`sudo apt-get install python2.7-dev`**
```7-dev


 Try to move

```bash
$ mecab
I'm Sakamoto and
Sakamoto?	????,????,*,*,*,*,*
??	̾??,??ͭ̾??,?ȿ?,*,*,*,*
??But????,????,*,*,*,*,*
EOS

The characters are garbled. It seems that it is not utf-8 if the character code of the dictionary is the default.

Move to the mecab-ipadic directory and reconfigure to UTF-8. Make clean and then reconfigure

$ make clean
$ ./configure --with-charset=utf8
$ make 
$ sudo make install

Ubuntu

libmecab.so.2: cannot open shared object file: No such file or directory


 If you are told
```sudo ldconfig ```
 It seems to be good

 Try using it.

```bash
$ mecab
I'm Sakamoto and
Sakamoto noun,Proper noun,Personal name,Surname,*,*,Sakamoto,Sakamoto,Sakamoto
Auxiliary verb,*,*,*,Special Death,Uninflected word,is,death,death
Is a particle,Connection particle,*,*,*,*,But,Moth,Moth
EOS

fixed.

By the way, the setting of mecab is written in mecabrc.

$ sudo find / -name "mecabrc"
/usr/local/etc/mecabrc
$ sudo emacs /usr/local/etc/mecabrc

It was like this by default

;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir =  /usr/local/lib/mecab/dic/ipadic

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

It seems that dicdir is a directory of dictionary data.

Install CaboCha

The latest version at the time of writing is 0.68 https://code.google.com/p/cabocha/

Download and unzip cabocha-0.68.tar.bz2 from Downloads

$ cd cabocha-0.68
$ ./configure
$ make
$ sudo make install

$ cd pythin
$ sudo python setup.py install

Try morphological analysis with MeCab

Operation check with python

import MeCab
mt = MeCab.Tagger("-Ochasen")
print mt.parse("I'm Sakamoto and")
Sakamoto Sakamoto Noun Sakamoto-Proper noun-Personal name-Surname
It's death. Auxiliary verb special / death basic form
Ga ga ga particle-Connection particle
EOS

See part of speech

It's annoying that I can't go unless I'm very careful about the character code.

# coding: utf-8
import MeCab

mt = MeCab.Tagger("mecabrc")
res = mt.parseToNode("I'm Sakamoto and")

while res:
	print res.surface
	print res.feature
	res = res.next

BOS/EOS,*,*,*,*,*,*,*,*
Sakamoto
noun,固有noun,Personal name,Surname,*,*,Sakamoto,Sakamoto,Sakamoto
is
Auxiliary verb,*,*,*,Special Death,Uninflected word,is,death,death
But
Particle,接続Particle,*,*,*,*,But,Moth,Moth

BOS/EOS,*,*,*,*,*,*,*,*

There were many implementations that split res.feature with ",", but I wonder if there is no choice but to do so. Well, it doesn't seem to be a problem, so I'll try it.

# coding: utf-8
import MeCab

mt = MeCab.Tagger("mecabrc")
res = mt.parseToNode("I'm Sakamoto and")

while res:
	print res.surface
	arr = res.feature.split(",")
	print "Part of speech: " + arr[0]
	res = res.next

Part of speech: BOS/EOS
Sakamoto
Part of speech:noun
is
Part of speech:Auxiliary verb
But
Part of speech:Particle

Part of speech: BOS/EOS

If you don't set res = res.next instead of res.next, you will naturally loop infinitely. I'm addicted to using Java.

Try using CaboCha

I will write it again if I try using.

Recommended Posts

From Python to using MeCab (and CaboCha)
Notes on using MeCab from Python
MeCab from Python
Using Python and MeCab with Azure Databricks
Porting and modifying doublet-solver from python2 to python3.
Tweet analysis with Python, Mecab and CaboCha
How to get followers and followers from python using the Mastodon API
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
I want to email from Gmail using Python.
[Python] How to read data from CIFAR-10 and CIFAR-100
Create a tool to automatically furigana with html using Mecab from Python3
Flatten using Python yield from
Post from Python to Slack
Cheating from PHP to Python
Make MeCab available from Python3
Push notifications from Python to Android using Google's API
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
Post to Twitter using Python
MessagePack-Call Python (or Python to Ruby) methods from Ruby using RPC
Start to Selenium using python
Switch from python2.7 to python3.6 (centos7)
Connect to sqlite from python
Copy S3 files from Python to GCS using GSUtil
Get files from Linux using paramiko and scp [Python]
Query from python to Amazon Athena (using named profile)
Try to make it using GUI and PyQt in Python
Convert from Pandas DataFrame to System.Data.DataTable using Python for .NET
Visualize plant activity from space using satellite data and Python
Python regular expression basics and tips to learn from scratch
How to connect to various DBs from Python (PEP 249) and SQLAlchemy
Predict gender from name using Gender API and Pykakasi in Python
[Python] Random data extraction / combination from DataFrame using random and pandas
Shoot time-lapse from a PC camera using Python and OpenCV
Go language to see and remember Part 8 Call GO language from Python
Convert pixiv to mp4 and download from pixiv using python's pixivpy
Call Matlab from Python to optimize
How to install python using anaconda
Python 3.6 on Windows ... and to Xamarin.
[Introduction to Python3 Day 1] Programming and Python
Using Rstan from Python with PypeR
Authentication using tweepy-User authentication and application authentication (Python)
Python, yield, return, and sometimes yield from
Create folders from '01' to '12' with python
Post from python to facebook timeline
[Lambda] [Python] Post to Twitter from Lambda!
Read and use Python files from Python
About Python, from and import, as
Connect to utf8mb4 database from python
Using Cloud Storage from Python3 (Introduction)
Python (from first time to execution)
Mecab / Cabocha / KNP on Python + Windows
Post images from Python to Tumblr
Clustering and visualization using Python and CytoScape
Python logging and dump to json
How to access wikipedia from python
Python to switch from another language
Selenium and python to open google
Run Ansible from Python using API
Precautions when using phantomjs from python
Access spreadsheets using OAuth 2.0 from Python