[PYTHON] How to use mecab, neologd-ipadic on colab

Introduction

What is ipadic-neologd? mecab-ipadic-NEologd : Neologism dictionary for MeCab One of the dictionaries for mecab. It updates more than twice a week, so you can deal with new words and named entities.

Example


#ipadic-neologd unused
m=MeCab.Tagger()
print(m.parse("COVID-19 caused an overshoot."))
>COVID COVID COVID noun-Proper noun-Organization
 -	-	-noun-Change connection
19 19 19 noun-number
By grinning by particles-Case particles-Collocation
Over over over noun-Change connection
Shoot shoot shoot noun-Change connection
Ga ga ga particle-Case particles-General
Wake up ok wake up verb-Independent five-stage / la line continuous connection
Ta ta auxiliary verb special ta ta basic form
.. .. .. symbol-Kuten
 EOS

#ipadic-using neologd
m=MeCab.Tagger("-d {Dictionary path}")
print(m.parse("COVID-19 caused an overshoot."))
>COVID-19 nouns,Proper noun,General,*,*,*,COVID-19,Covid Nine Teen,Covid Nine Teen
By particles,Case particles,Collocation,*,*,*,By,Grinning,Grinning
Overshoot noun,Proper noun,General,*,*,*,Overshoot,Overshoot,Overshoot
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Happening verb,Independence,*,*,Five steps, La line,Continuous connection,Occur,Oko,Oko
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
 EOS

How to install on Colab

The following article was very easy to understand. mecab ipadic-NEologd を Google Colaboratory で使う If you get an error, adding the following worked fine: !sudo cp /etc/mecabrc /usr/local/etc/

how to use


import MeCab
m=MeCab.Tagger("{Output format(See below)} -d {ipadic-neologd path}")
print(m.parse("Keep your social distance"))

Output format

1. mecabrc: no arguments


Social distance noun,Proper noun,General,*,*,*,Social distance,Social distance,Social distance
Particles,Case particles,General,*,*,*,To,Wo,Wo
Tamotsu and verb,Independence,*,*,Five steps / Ta line,Connection,keep,Tamoto,Tamoto
Auxiliary verb,*,*,*,Immutable type,Uninflected word,U,C,C
EOS

Surface shape: Remains separated into morphemes Part of speech: nouns, verbs, particles, auxiliary verbs, etc. Part of speech subcategory 1: Noun → proper noun, verb → independence, particle → case particle Part of speech subcategory 2: General, quote Part of speech subdivision 3: Utilization type: Verb → 5th dan / Ta line Inflection type: C connection Prototype. Reading, pronunciation:

2. -Ochasen: ChaSen compatible format


Social Distance Social Distance Social Distance Noun-Proper noun-General
Wo Wo particle-Case particles-General
Tamotsu and Tamoto Keep verbs-Independence 5 steps / Ta line connection
Uuu auxiliary verb invariant basic form
EOS

3. -Owakati: Word-separation only


Maintain social distance

4. -Oyomi: Read only

Social Distance Otaku

5. -Odump: Output all information

0 BOS BOS/EOS,*,*,*,*,*,*,*,* 0 0 0 0 0 0 2 1 0.000000 0.000000 0.000000 0
6 Social distance nouns,Proper noun,General,*,*,*,Social distance,Social distance,Social distance 0 33 1288 1288 41 7 0 1 0.000000 0.000000 0.000000 -1987
213 particles,Case particles,General,*,*,*,To,Wo,Wo 33 36 156 156 13 6 0 1 0.000000 0.000000 0.000000 -1613
218 Ho and verb,Independence,*,*,Five steps / Ta line,Connection,keep,Tamoto,Tamoto 36 42 739 739 31 2 0 1 0.000000 0.000000 0.000000 3067
234 Auxiliary verb,*,*,*,Immutable type,Uninflected word,U,C,C 42 45 506 506 25 6 0 1 0.000000 0.000000 0.000000 3215
236 EOS BOS/EOS,*,*,*,*,*,*,*,* 45 45 0 0 0 0 3 1 0.000000 0.000000 0.000000 1300```  


Recommended Posts

How to use mecab, neologd-ipadic on colab
How to use Dataiku on Windows
Notes on how to use pywinauto
Notes on how to use featuretools
How to use homebrew on Debian
Notes on how to use doctest
How to use Google Assistant on Windows 10
Memorandum on how to use gremlin python
How to use xml.etree.ElementTree
How to use Python-shell
How to use tf.data
How to use virtualenv
How to use Seaboan
How to use image-match
How to use shogun
How to use Pandas 2
How to use numpy.vectorize
How to use pytest_report_header
How to use partial
How to use Bio.Phylo
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to use IPython
How to use virtualenv
How to use Matplotlib
How to use iptables
How to use numpy
How to use TokyoTechFes2015
How to use venv
How to use dictionary {}
How to use Pyenv
How to use list []
How to use python-kabusapi
How to use OptParse
How to use return
How to use dotenv
How to use pyenv-virtualenv
How to use Go.mod
How to use imutils
How to use import
How to use Python Kivy ④ ~ Execution on Android ~
How to run MeCab on Ubuntu 18.04 LTS Python
How to use Qt Designer
How to use search sorted
[gensim] How to use Doc2Vec
python3: How to use bottle (2)
Use MeCab to fetch readings
How to use the generator
How to use C216 Audio Controller on Arch Linux
[Python] How to use list 1
How to use FastAPI ③ OpenAPI
A memorandum on how to use keras.preprocessing.image in Keras
How to use TensorFlow on GPUs less than Titan
How to register on pypi
How to use Python argparse
How to use IPython Notebook
How to use Pandas Rolling
[Note] How to use virtualenv
How to use redis-py Dictionaries
Python: How to use pydub