Put MeCab binding for Python with pip on Windows, mac and Linux

Introduction

Ciao ... †

Personally, I've been getting more and more opportunities to use MeCab with Python on Windows these days. However, in order to install MeCab's Python wrapper on Windows, you have to download the source, rewrite setup.py, and install the compiler, which is very troublesome.

So, we have released something that makes it easy to use MeCab's Python wrapper with pip on Windows, mac, and Ubuntu! https://pypi.org/project/mecab/

What is this?

It is a MeCab wrapper that supports various OSs in one package by changing the behavior depending on the OS at the time of installation. For Windows, for example, Microsoft Visual Studio's C ++ compiler builds mecab-python and puts it in wheel format. On the other hand, in the case of mac and Linux, the C ++ code for binding is compiled, so it cannot be installed unless the target computer has a C ++ compiler.

It currently supports Python 2.7, 3.6, 3.7, 3.8. All versions support both 32-bit and 64-bit. It has been tested on Windows 10, macOS 10.14 and Ubuntu 18.04.

However, it is assumed that the 64-bit version of Python for Windows has the following stray build 64-bit version of MeCab installed. https://github.com/ikegami-yukino/mecab/releases

Also, since the Windows version of Cabocha is distributed only in 32-bit binaries, ** If you want to use it in combination with Cabocha on Windows, please use the 32-bit version of Python. ** (Sorry for being complicated)

Benefits of mecab

--Since it is distributed in wheel format on PyPI, it can be used on Windows without a C ++ compiler. --Can be used in common on Windows, macOS, Linux, etc. --Since the interface is exactly the same as the official Python binding, there is no need to rewrite existing code. --Basically the same as the official Python binding, so processing is fast --Fixed official Python binding bug --No need to rewrite setup.py (Official Python binding requires rewriting setup.py to support Python 3) --Supports all MeCab dictionaries --Extra items such as SWIG and MeCab dictionaries are not included

Installation

$ pip install mecab

Or

$ python -m pip install mecab

You can put it in with.

If you have old Python 2.7 without pip, download get-pip.py and run it in Python to get pip.

If you get an error like MeCab_wrap.cxx: 178: 11: fatal error:'Python.h' file not found

$ CPLUS_INCLUDE_PATH=`python-config --prefix`/Headers:$CPLUS_INCLUDE_PATH pip install mecab

Please try.

How to use

Basic


>>> import MeCab
>>> t = MeCab.Tagger()
>>> sentence = "Taro gave this book to a woman."
>>> print(t.parse(sentence))
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
>>> n = t.parseToNode(sentence)
>>> while n:
>>>     print(n.surface, "\t", n.feature)
>>>     n = n.next
           BOS/EOS,*,*,*,*,*,*,*,*
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
         BOS/EOS,*,*,*,*,*,*,*,*

application

This is an example of code for IPA dictionary and mecab-ipadic-neologd dictionary.

Specifying a dictionary

#When using a dictionary such as NEologd"-d"Specify the dictionary directory with
t = MeCab.Tagger("-d /path/to/dic/mecab-ipadic-neologd")

Word-separation

t = MeCab.Tagger("-O wakati")
print(t.parse(sentence).rstrip())
#=>Taro handed this book to a woman.

Separate writing corresponding to proper nouns mixed with half-width spaces

NEologd is recommended as a dictionary because it has abundant proper nouns.

t = MeCab.Tagger("-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd -F%m\\t -E\\n")
print(t.parse("I want to go live with DIR EN GRAY").rstrip().split("\t"))
#=>['DIR EN GREY', 'of', 'live', 'To go', 'Want']

Get reading


#To get the reading"-O yomi"
t = MeCab.Tagger("-O yomi")
print(t.parse(sentence).rstrip())
#=>Taro Hakonohonwo Josei Niwatashita.

Get word-by-word reading

t = MeCab.Tagger("-F%f[7]\\t -E\\n -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
print(t.parse(sentence).rstrip().split("\t"))
#=>['Taro', 'C', 'this', 'Hong', 'Wo', 'Josei', 'D', 'I', 'Ta', '。']

Extraction of content words and function words

CONTENT_WORD_POS = ("noun", "verb", "adjective", "adverb")
IGNORE = ("suffix", "Non-independent", "Pronoun")


def is_content_word(feature):
    return feature.startswith(CONTENT_WORD_POS) and all(f not in IGNORE for f in feature.split(",")[:6])

t = MeCab.Tagger()
n = t.parseToNode(sentence)
content_words = []
function_words = []
while n:
    if is_content_word(n.feature):
        content_words.append((n.surface, n.feature))
    elif not n.feature.startswith("BOS/EOS,"):
        function_words.append((n.surface, n.feature))
    n = n.next

print(content_words)  #Content word
#=> [('Taro', 'noun,固有noun,Personal name,Name,*,*,Taro,Taro,Taro'), ('Book', 'noun,General,*,*,*,*,Book,Hong,Hong'), ('Female', 'noun,General,*,*,*,*,Female,Josei,Josei'), ('Pass', 'verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I')]

print(function_words)  #Function words
#=> [('Is', 'Particle,係Particle,*,*,*,*,Is,C,Wow'), ('this', 'Adnominal adjective,*,*,*,*,*,this,this,this'), ('To', 'Particle,格Particle,General,*,*,*,To,Wo,Wo'), ('To', 'Particle,格Particle,General,*,*,*,To,D,D'), ('Ta', 'Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta'), ('。', 'symbol,Kuten,*,*,*,*,。,。,。')]

Restore to original shape

t = MeCab.Tagger()
n = t.parseToNode("I handed over a very good book")
lemma = []
while n:
    if not n.feature.startswith("BOS/EOS,"):
        lemma.append(n.feature.split(",")[6])
    n = n.next
print(lemma)
#=> ['Wow', 'Good', 'Book', 'To', 'hand over', 'Ta']

See past articles for constrained analysis. https://qiita.com/yukinoi/items/4e7afb5e72b3a46da0f2

A big request

If you like it, I'd appreciate it if you could star mecab's GitHub repository. With just one click, my development motivation is up.

Recommended Posts

Put MeCab binding for Python with pip on Windows, mac and Linux
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
(Windows) Causes and workarounds for UnicodeEncodeError on Python 3
Notes on installing Python3 and using pip on Windows7
Install OpenCV 4.0 and Python 3.7 on Windows 10 with Anaconda
Put Python 2.7.x on Mac OSX 10.15.5 with pyenv
Install ZIP version Python and pip on Windows 10
Initial settings for using Python3.8 and pip on CentOS8
Install selenium on Mac and try it with python
PIL with Python on Windows 8 (for Google App Engine)
Compile and install MySQL-python for python2.7 on amazon linux
Install procs, an alternative tool for ps, on Linux (also available on Mac and Windows)
Test Python with Miniconda on OS X and Linux with travis-ci
Build Python3 for Windows 10 on ARM with Visual Studio 2019 (x86) on Windows 10 on ARM
Comfortable LaTeX with Windows Subsystem for Linux and VS Code
Python 3.6 on Windows ... and to Xamarin.
Install Python 2.7.9 and Python 3.4.x with pip.
Put MeCab in "Windows 10; Python3.5 (64bit)"
Install Python on Windows + pip + virtualenv
Integrate Modelica and Python on Windows
Mecab / Cabocha / KNP on Python + Windows
Mastering pip and wheel on windows
Getting started with Python 3.8 on Windows
pykintone on Windows Subsystem for Linux
[Python Windows] pip install with Python version
Install easy_install and pip on windows
Build a 64-bit Python 2.7 environment with TDM-GCC and MinGW-w64 on Windows 7
Build a Python environment on your Mac with Anaconda and PyCharm
Error and solution when installing python3 with homebrew on mac (catalina 10.15)
Introduced binding of MeCab (Wakame seaweed) and Python to Windows 7 64bit (2016/08/18)
Install lp_solve on Mac OS X and call it with python.
Notes for using OpenCV on Windows10 Python 3.8.3.
[UE4] Build DedicatedServer on Windows and Linux
Solution for pip install error [Python] [Mac]
Install wsl2 and master linux on windows
Python installation and package management with pip
Using Python and MeCab with Azure Databricks
A memo with Python2.7 and Python3 on CentOS
Follow active applications on Mac with Python
[C] [python] Read with AquesTalk on Linux
scipy stumbles with pip install on python 2.7.8
Notes on building Python and pyenv on Mac
Build Python environment with Anaconda on Mac
Tweet analysis with Python, Mecab and CaboCha
Install Mecab on Linux (CentOS) with brew
Replacing rmtrash on Mac and replacing rm on Linux
Install Python 3.8, Pip 3.8 on EC2 (Amazon Linux 2)
Use Python and MeCab with Azure Functions
A python script for Mac that zips without garbled characters on Windows
Send and receive binary data via serial communication with python3 (on mac)
Create a Python3 environment with pyenv on Mac and display a NetworkX graph
Rock-paper-scissors with Python Let's run on a Windows local server for beginners
Put Docker in Windows Home and run a simple web server with Python
How to get started with the 2020 Python project (windows wsl and mac standardization)
Put MicroPython on Windows to run ESP32 on Python
Causal reasoning and causal search with Python (for beginners)
Python on Windows
Character code for reading and writing csv files with python ~ windows environment ver ~
Image Processing with Python Environment Setup for Windows
python on mac
Organize files on Windows with Linux commands-using WSL-