Add words to MeCab's user dictionary on Ubuntu for use in Python

Introduction

Recently, I started to analyze using python and MeCab in my research, but I had a hard time adding words to the user dictionary, so I summarized it for myself.

environment

1. Prepare a dictionary

Create the dictionary as a csv file. The format of the dictionary is Surface form, left context ID, right context ID, cost, part of speech, part of speech subclassification 1, part of speech subclassification 2, part of speech subclassification 3, inflected type, inflected form, prototype, reading, pronunciation Arrange in the order of.

vim add_term.csv
Frozen,,,1,noun,General,*,*,*,*,Frozen,Anat Yukinojoou,Anat Yukinojoo

If you leave the left context ID and right context ID blank, they will be entered automatically. Also, the cost indicates how likely the word is to appear, and the smaller it is, the more likely it is to appear. There seems to be a cost estimation method, but this time I set it to 1. Unnecessary items are OK with "*".

2. Create a user dictionary

Create a user dictionary from the created csv file. To create a dictionary, use the mecab-dict-index that came with MeCab when you installed it.

#Creating a user dictionary save destination directory
mkdir /usr/local/lib/mecab/dic/userdic

#Dictionary creation
sudo /usr/lib/mecab/mecab-dict-index \
-d /usr/local/mecab/dic/ipadic \
-u /usr/local/lib/mecab/dic/userdic/add.dic \
-f utf-8 \
-t utf-8 \
add_term.csv

The options are: -d Directory containing system dictionaries -u Where to save the user dictionary -f csv File character code -t Character code of user dictionary csv file

run mecab-dict-index with full path. Also at this time, specify UTF-8 as the character code.

reading add_term.csv ... 1
emitting double-array: 100% |###########################################|

done!

Is displayed, it is successful.

3. Add the created user dictionary to the MeCab configuration file

Add the following statement to the configuration file.

sudo vim /etc/mecabrc
userdic = /usr/local/lib/mecab/dic/userdic/add.dic

On the official website /usr/local/lib/mecab/dic/ipadic/dicrc /usr/local/etc/mecabrc It is written to add to either of them, but it did not work in my environment, and since there was mecabrc in the above location, it worked correctly by adding it there. If you want to register multiple dictionaries,

userdic = AAA.dic,BBB.dic

If so, I was able to register.

Operation check

--Check from the command line

#Before addition
mecab
Frozen
Ana noun,General,*,*,*,*,Anna,Anna,Anna
And particles,Parallel particles,*,*,*,*,When,To,To
Snow noun,General,*,*,*,*,snow,Snow,Snow
Particles,Attributive,*,*,*,*,of,No,No
Queen noun,General,*,*,*,*,Queen,The Queen,Jooh
EOS

#After addition
Frozen
Anna and the Snow Queen noun,General,*,*,*,*,Frozen,Anat Yukinojoou,Anat Yukinojoo
EOS

--Use with MeCab in python

python3


>>> import MeCab
>>> m_t = MeCab.Tagger('-Ochasen \
                        -u /usr/local/lib/mecab/dic/userdic/add.dic')
>>> txt = 'Let's go see Anna and the Snow Queen.'
>>> print(m_t.parse(txt))
Let's go see Anna and the Snow Queen.

If you want to use it with the installed mecab-ipadic-neologd

python3


>>> import MeCab
>>> m_t = MeCab.Tagger('-Ochasen \
                        -d /usr/lib/mecab/dic/mecab-ipadic-neologd \
                        -u /usr/local/lib/mecab/dic/userdic/add.dic')

If you change it, it will be read at the same time.

Conclusion

After some trial and error, I was able to confirm that it works well on python. I would appreciate it if you could point out any mistakes.

Reference site

How to add words Adding words to MeCab user dictionary

Recommended Posts

Add words to MeCab's user dictionary on Ubuntu for use in Python
[Japanese version] Judgment of word similarity for polysemous words using ELMo and BERT
Use ELMo, BERT, USE to detect anomalies in sentences
How to install and use Tesseract-OCR
How to use .bash_profile and .bashrc
How to install and use Graphviz
Add words to MeCab's user dictionary on Ubuntu for use in Python
Add user dictionary to MeCab
How to use SQLite in Python
How to use Mysql in python
How to use PubChem in Python
Install confluent-kafka for Python on Ubuntu
How to use python put in pyenv on macOS with PyCall
[Introduction to Python] How to use the in operator in a for statement?
[Introduction to Python] How to use class in Python?
Notes on nfc.ContactlessFrontend () for nfcpy in python
Steps to install Python environment on Ubuntu
Easy way to use Wikipedia in Python
[Python] Organizing how to use for statements
Memorandum on how to use gremlin python
Python / dictionary> setdefault ()> Add if not in dictionary
How to use __slots__ in Python class
Install python on xserver to use pip
How to use "deque" for Python data
Use pathlib in Maya (Python 2.7) for upcoming Python 3.7
How to use regular expressions in Python
How to use is and == in Python
Add syntax highlighting for the Kv language to Spyder in the Python IDE
If you want to count words in Python, it's convenient to use Counter.
A memorandum because I stumbled on trying to use MeCab in Python
Add words to MeCab's user dictionary on Ubuntu for use in Python
Isn't there a default value in the dictionary?
Change the reserved words in Flask's template engine
[AWS IoT] Register things in AWS IoT using the AWS IoT Python SDK
Register a task in cron for the first time
Implement the Django user extension and register the attached information
How to use the C library in Python
Easy way to use Python 2.7 on Cent OS 6
I want to use Python in the environment of pyenv + pipenv on Windows 10
Tips for those who are wondering how to use is and == in Python
How to use Python Kivy ④ ~ Execution on Android ~
How to run MeCab on Ubuntu 18.04 LTS Python
Summary of how to use MNIST in Python
MeCab: Add new words to user-defined dictionary (Windows)
Use cryptography module to handle OpenSSL in Python
Things to keep in mind when using Python for those who use MATLAB
To add a module to python put in Julialang
Don't use readlines () in your Python for statement!
How to use tkinter with python in pyenv
To write to Error Repoting in Python on GAE
Use the LibreOffice app in Python (3) Add library
When I tried to use Python on WSL (windows subsystem for linux), it got stuck in Jupyter (solved)
Use os.getenv to get environment variables in Python
Use kintone API SDK for Python on Raspberry Pi (easily store data in kintone from Raspberry Pi)
Notes on how to use StatsModels that can use linear regression and GLM in python
[Latest] How to use Python library to save Google image search & use Chrome Driver on ubuntu
Tool to make mask image for ETC in Python
[BigQuery] How to use BigQuery API for Python -Table creation-
[Python] [Django] How to use ChoiceField and how to add options
How to run python in virtual space (for MacOS)
A memorandum on how to use keras.preprocessing.image in Keras
How to add page numbers to PDF files (in Python)
Ubuntu 20.04 on raspberry pi 4 with OpenCV and use with python
Convenient to use matplotlib subplots in a for statement
I tried to summarize how to use pandas in python
I want to use OpenJDK 11 on Ubuntu Linux 18.04 LTS / 18.10
How to use Django on Google App Engine / Python
How to use the model learned in Lobe in Python
I want to use the R dataset in python
Beginners use Python for web scraping (4) --2 Scraping on Cloud Shell
Use config.ini in Python