MeCab from Python

Task

Method

Premise

Contents

1. Build the build environment

2. Create libmecab.lib and libmecab.dll (build MeCab)

:feature_index.cpp.patch


--- mecab-0.996.org/src/feature_index.cpp	Sun Nov 25 14:35:33 2012
+++ mecab-0.996/src/feature_index.cpp	Sat Mar  1 11:19:20 2014
@@ -353,7 +353,7 @@
               if (!r) goto NEXT;
               os_ << r;
             } break;
-            case 't':  os_ << (size_t)path->rnode->char_type;     break;
+            case 't':  os_ << (unsigned int)path->rnode->char_type;     break;
             case 'u':  os_ << ufeature; break;
             case 'w':
               if (path->rnode->stat == MECAB_NOR_NODE) {

:writer.cpp.patch


--- mecab-0.996.org/src/writer.cpp	Sun Sep 30 01:44:27 2012
+++ mecab-0.996/src/writer.cpp	Sat Mar  1 11:20:32 2014
@@ -257,7 +257,7 @@
             // input sentence
           case 'S': os->write(lattice->sentence(), lattice->size()); break;
             // sentence length
-          case 'L': *os << lattice->size(); break;
+          case 'L': *os << (unsigned int)lattice->size(); break;
             // morph
           case 'm': os->write(node->surface, node->length); break;
           case 'M': os->write(reinterpret_cast<const char *>

:Makefile.msvc.in.patch


--- mecab-0.996.org/src/Makefile.msvc.in	Sun Sep 30 01:44:27 2012
+++ mecab-0.996/src/Makefile.msvc.in	Thu Mar  6 02:36:41 2014
@@ -3,7 +3,7 @@
 LINK=link.exe
 
 CFLAGS = /EHsc /O2 /GL /GA /Ob2 /nologo /W3 /MT /Zi /wd4800 /wd4305 /wd4244
-LDFLAGS = /nologo /OPT:REF /OPT:ICF /LTCG /NXCOMPAT /DYNAMICBASE /MACHINE:X86 ADVAPI32.LIB
+LDFLAGS = /nologo /OPT:REF /OPT:ICF /LTCG /NXCOMPAT /DYNAMICBASE /MACHINE:X86_amd64 ADVAPI32.LIB
 DEFS =  -D_CRT_SECURE_NO_DEPRECATE -DMECAB_USE_THREAD \
         -DDLL_EXPORT -DHAVE_GETENV -DHAVE_WINDOWS_H -DDIC_VERSION=@DIC_VERSION@ \
         -DVERSION="\"@VERSION@\"" -DPACKAGE="\"mecab\"" \
> call "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\vcvarsall.bat" X86_amd64
> nmake -f Makefile.msvc.in

3. Create installer (build Python binding)

setup.py


#!/usr/bin/env python

from distutils.core import setup,Extension

setup(name = "mecab-python",
	version = "0.996",
	py_modules=["MeCab"],
	ext_modules = [
		Extension("_MeCab",
			["MeCab_wrap.cxx",],
			include_dirs=['C:/Program Files (x86)/MeCab/sdk'],
			library_dirs=['C:/Program Files (x86)/MeCab/sdk'],
			libraries=["libmecab"])
			])
> python setup.py bdist_wininst

4. Installation (running the installer)

5. Operation check

> python test.py
0.996
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Two nouns,number,*,*,*,*,two,D,D
Ro noun,General,*,*,*,*,Ro,Rowe,Low
Particles,Case particles,General,*,*,*,To,Wo,Wo
See verb,Independence,*,*,One step,Continuous form,to see,Mi,Mi
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. Symbol,Kuten,*,*,*,*,。,。,。
EOS

  	BOS/EOS,*,*,*,*,*,*,*,*
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Two nouns,number,*,*,*,*,two,D,D
Ro noun,General,*,*,*,*,Ro,Rowe,Low
Particles,Case particles,General,*,*,*,To,Wo,Wo
See verb,Independence,*,*,One step,Continuous form,to see,Mi,Mi
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. Symbol,Kuten,*,*,*,*,。,。,。
  	BOS/EOS,*,*,*,*,*,*,*,*
EOS
EOS
filename: C:\Program Files (x86)\MeCab\etc\..\dic\ipadic\sys.dic
charset: UTF-8
size: 392126
type: 0
lsize: 1316
rsize: 1316
version: 102

reference

Recommended Posts

MeCab from Python
Make MeCab available from Python3
Notes on using MeCab from Python
sql from python
From Python to using MeCab (and CaboCha)
Use thingsspeak from python
Touch MySQL from Python 3
Operate Filemaker from Python
Use fluentd from python
Access bitcoind from python
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Python from or import
Use MySQL from Python
Use mecab with Python3
Run python from excel
Install python from source
Operate neutron from Python!
Use MySQL from Python
Operate LXC from Python
Manipulate riak from python
Force Python from Fortran
Use BigQuery from python.
Execute command from python
[Python] Read From Stdin
Use mecab-ipadic-neologd from python
Flatten using Python yield from
Call CPLEX from Python (DO cplex)
Deep Python learned from DEAP
Post from Python to Slack
Grammar features added from Python3.6
Cheating from PHP to Python
Python
Information obtained from tweet_id (Python)
OCR from PDF in Python
Collecting information from Twitter with Python (morphological analysis with MeCab)
Run illustrator script from python
Use MySQL from Anaconda (python)
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
Study from Python Hour4: Object-oriented ②
Query Athena from Lambda Python
Access Oracle DB from Python
Study from Python Hour3: Functions
Keyword extraction by MeCab (python)
Stop Omxplayer from Python code
Switch from python2.7 to python3.6 (centos7)
Connect to sqlite from python
Install pyenv from Homebrew, install Python from pyenv
Study from Python Hour4: Object-oriented ①
Python naming convention (from PEP8)
With skype, notify with skype from python!
Use e-Stat API from Python
Register redmine issue from Python
Call Matlab from Python to optimize
Execute Python script from batch file
Call a Python function from p5.js.
Python: Exclude tags from html data
Use Stanford Core NLP from Python
Touch a Python object from Elixir
Hit treasure data from Python Pandas
Using Rstan from Python with PypeR