Use mecab-ipadic-neologd from python

Thing you want to do

--Use MeCab for morphological analysis - http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html

--Use a new word dictionary - https://github.com/neologd/mecab-ipadic-neologd/ --Use in combination with other modules in Python scripts

environment

Python 2.7 Use Conda.

$ conda create -n py27con python=2.7 anaconda
$ conda info -e
$ source ~/.pyenv/versions/miniconda3-3.16.0/envs/py27con/bin/activate py27con

mecab-ipadic I will use mecab-ipadic-neologd later, so I will put it in UTF-8

$ cd ~/path/to/mecab-ipadic-2.7.0-20070801/
$ make clean
$ ./configure --with-charset=utf8
$ make
$ make install

mecab-ipadic-neologd

$ cd ~/path/to/mecab-ipadic-neologd/
$ bin/install-mecab-ipadic-neologd 

mecab-python

Python bindings for MeCab

$ pip install https://mecab.googlecode.com/files/mecab-python-0.996.tar.gz

Operation check

test.py


# -*- coding: utf-8 -*-
import MeCab
m = MeCab.Tagger(' -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')

text = '''
"THE IDOLM @ STER CINDERELLA GIRLS" (THE IDOLM@STER CINDERELLA GIRLS) is "THE IDOLM" developed and operated by NAMCO BANDAI Entertainment (formerly NAMCO BANDAI Games) and Cygames.@A social game dedicated to mobile terminals with the motif of the world of STER.
'''
print(m.parse(text))

The text is [Wikipedia](https://ja.wikipedia.org/wiki/%E3%82%A2%E3%82%A4%E3%83%89%E3%83%AB%E3%83%9E%E3 % 82% B9% E3% 82% BF% E3% 83% BC_% E3% 82% B7% E3% 83% B3% E3% 83% 87% E3% 83% AC% E3% 83% A9% E3% 82 From% AC% E3% 83% BC% E3% 83% AB% E3% 82% BA).

$ python test.py
"Symbol,Open parentheses,*,*,*,*,『,『,『
The Idolmaster Cinderella Girls Noun,Proper noun,General,*,*,*,Idolmaster Cinderella Girls,Idolmaster Cinderella Girls,Idolmaster Cinderella Girls
』Symbol,Parentheses closed,*,*,*,*,』,』,』
(Symbol,Open parentheses,*,*,*,*,(,(,(
THE IDOLM@STER CINDERELLA GIRLS noun,Proper noun,General,*,*,*,THE IDOLM@STER CINDERELLA GIRLS,Idolmaster Cinderella Girls,Idolmaster Cinderella Girls
) Symbol,Parentheses closed,*,*,*,*,),),)
Is a particle,Particle,*,*,*,*,Is,C,Wow
, Symbol,Comma,*,*,*,*,、,、,、
BANDAI NAMCO Entertainment Noun,Proper noun,General,*,*,*,BANDAI NAMCO Entertainment,BANDAI NAMCO Entertainment,BANDAI NAMCO Entertainment
(Symbol,Open parentheses,*,*,*,*,(,(,(
Old prefix,Noun connection,*,*,*,*,Old,Kyu,queue
Bandai Namco Games Noun,Proper noun,General,*,*,*,BANDAI NAMCO Games,Bandai Namco Games,Bandai Namco Games
) Symbol,Parentheses closed,*,*,*,*,),),)
And particles,Parallel particles,*,*,*,*,When,To,To
Cygames noun,Proper noun,General,*,*,*,Cygames,Cygames,Cygames
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Development noun,Change connection,*,*,*,*,development of,Kaihatsu,Kaihatsu
・ Symbol,General,*,*,*,*,・,・,・
Management noun,Change connection,*,*,*,*,Operation,Unei,Unei
Verb to do,Independence,*,*,Sahen Suru,Uninflected word,To do,Suru,Suru
"Symbol,Open parentheses,*,*,*,*,『,『,『
THE IDOLM@STER noun,Proper noun,General,*,*,*,THE IDOLM@STER,Idol Master,Idol Master
』Symbol,Parentheses closed,*,*,*,*,』,』,』
Particles,Attributive,*,*,*,*,of,No,No
Worldview noun,Proper noun,General,*,*,*,View of the world,Sekaikan,Sekaikan
Particles,Case particles,General,*,*,*,To,Wo,Wo
Motif noun,General,*,*,*,*,motif,motif,motif
And particles,Case particles,General,*,*,*,When,To,To
Verb to do,Independence,*,*,Sahen Suru,Uninflected word,To do,Suru,Suru
Mobile terminal noun,Proper noun,General,*,*,*,Mobile terminal,Keitaitan pine,Keitaitan pine
Dedicated noun,Change connection,*,*,*,*,designated,Senyo,Senyo
Particles,Attributive,*,*,*,*,of,No,No
Social game noun,Proper noun,General,*,*,*,social game,social game,social game
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

By the way, if you omit -d / usr / local / lib / mecab / dic / mecab-ipadic-neologd and see the difference, you can see that the new word dictionary works nicely (mainly unique). noun).

Where I put it on hold

List of frequent problems:

--The output is garbled ――Maybe you just need to use the UTF-8 dictionary properly --Some differences / conflicts between Conda Python and System Python --Example: Shell crashes when source activate python --This can be done by specifying the path of ʻactivate` properly. --Work to make the obtained Python binding setup script and sample script compatible with Python 3.5 --Work to make the binding itself compatible with SWIG 3.5 --Still, I get Unicode related errors

I wanted to do it with 3.5 if possible, but I couldn't escape because I was addicted to it, so I did it with 2.7 for the time being.

Recommended Posts

Use mecab-ipadic-neologd from python
Use MySQL from Python
Use MySQL from Python
Use BigQuery from python.
Use MySQL from Anaconda (python)
Use e-Stat API from Python
Use Stanford Core NLP from Python
Forcibly use Google Translate from python
Use kabu Station® API from Python
Use Azure Blob Storage from Python
Use the Flickr API from Python
Use fastText trained model from Python
Use Google Analytics API from Python
sql from python
MeCab from Python
Use PostgreSQL data type (jsonb) from Python
Use machine learning APIs A3RT from Python
Use Google Cloud Vision API from Python
Use Django from a local Python script
Use C ++ functions from python with pybind11
Touch MySQL from Python 3
Use mecab-ipadic-neologd with igo-python
Use config.ini in Python
Operate Filemaker from Python
[Python] Use JSON with Python
Firebase: Use Cloud Firestore and Cloud Storage from Python
Use dates in Python
Access bitcoind from python
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Python from or import
Use Valgrind in Python
Use mecab with Python3
Use LiquidTap Python Client ③
Run python from excel
Use DynamoDB with Python
[Bash] Use here-documents to get python power from bash
Wrap C with Cython for use from Python
Execute command from Python
Use Python 3.8 with Anaconda
Use Python in your environment from Win Automation
[Python] format methodical use
Use python with docker
I want to use ceres solver from python
Operate LXC from Python
Use LiquidTap Python Client ②
Manipulate riak from python
Force Python from Fortran
Let's use different versions of SQLite3 from Python3!
Execute command from python
Wrap C ++ with Cython for use from Python
Use the nghttp2 Python module from Homebrew from pyenv's Python
[Python] Read From Stdin
Use Tor to connect from urllib2 [Python] [Mac]
Python: Use zipfile to unzip from standard input
Use LiquidTap Python Client ①
I wanted to use the Python library from MATLAB
Let's start Python from Excel. I don't use VBA.
Flatten using Python yield from
Call CPLEX from Python (DO cplex)
Let's use python janome easily