[PYTHON] [Caution] Specify system dictionary or user dictionary with mecab [Windows]

I used MeCab in python and I was caught in the dictionary specification, so I will leave it as a memorandum. The background is that it was necessary to switch the user dictionary between one process.

The official page of mecab is below. https://taku910.github.io/mecab/ Most of it is written here, but it is not written carefully, so it is necessary to investigate separately. The impression that mecab was not made on the premise of Windows, and the documents that came out after searching were few for Windows.

Conclusion

Paths are separated by / instead of being separated by \ or \ ** Do not insert a space when specifying the path to the dictionary **

Commentary

Here, I will explain when specifying a dictionary from python code.

By the way, when executing from the command line, there is no problem even if there is a space. However, since it is regarded as a space delimiter, cover the entire path with " ".

#System dictionary specification
mecab -d "C:\Program Files (x86)\MeCab\dic\ipadic"

#User dictionary specification
mecab -u "C:\Program Files (x86)\MeCab\dic\ipadic\user.dic"

Specify dictionary from python code

When specifying a dictionary, it is necessary to pass it as an argument when creating Tagger

import MeCab

tagger = MeCab.Tagger("-d [Path to system dictionary]")
tagger = MeCab.Tagger("-u [Path to user dictionary]")

For Windows, you probably have a dictionary in C: \ Program Files (x86) \ MeCab \ dic \ ipadic. (Maybe it's not x86)

I write it in the above [Path to dictionary], but there are two points to note.

  1. Paths must be separated by /
  2. Do not put spaces

Both are based on the python spec, not the MeCab spec.

Paths must be separated by /

If you use \ or \ in the double quotation marks " " of the string, it will be regarded as an escape character.

import MeCab

tagger = MeCab.Tagger("-d C:\Program Files (x86)\MeCab\dic\ipadic")
tagger = MeCab.Tagger("-u C:\Program Files (x86)\MeCab\dic\ipadic\user.dic")

not

tagger = MeCab.Tagger("-d C:/Program Files (x86)/MeCab/dic/ipadic")
tagger = MeCab.Tagger("-u C:/Program Files (x86)/MeCab/dic/ipadic/user.dic")

If you use an editor such as VS Code, this will be displayed as an error, so be aware of it immediately.

Do not put spaces

By default, I think there is a dictionary in C: \ Program Files (x86) \ MeCab \ dic \ ipadic, but if there is a space inProgram Files (x86), an error will occur.

If you want to specify a dictionary, you need to copy the dictionary to another location that does not contain spaces in the path and specify that dictionary. For example, create a folder called mecab directly under C, place a dictionary, and specify as follows.

import MeCab

tagger = MeCab.Tagger("-d C:/mecab/ipadic")
tagger = MeCab.Tagger("-u C:/mecab/ipadic/user.dic")

** * Addition *** In the case of mecab-python3, even if there is a space, it works normally if it is enclosed in quotation marks. pip install mecab-python3 @ palm23 Thank you for telling me.

Summary

Not limited to mecab, when specifying the path with python,

Recommended Posts

[Caution] Specify system dictionary or user dictionary with mecab [Windows]
Add user dictionary to MeCab
Achieve Linux/dev/null with Windows system function
MeCab: Add new words to user-defined dictionary (Windows)