From the introduction of JUMAN ++ to morphological analysis of Japanese with Python

Introduction

It is almost always necessary to use morphological analysis when doing things related to natural language processing. Morphological analyzers that can use Japanese are "MeCab" and "[JUMAN ++](http://nlp.ist.i.kyoto-u.ac." jp / index.php? JUMAN ++) "is famous. This time, we will introduce JUMAN ++ and perform morphological analysis.

The contents of this article are as follows.

What is natural language processing?

Natural language processing (English: natural language processing, abbreviation: NLP) is a series of technologies that allow a computer to process the natural language that humans use on a daily basis, and is used in artificial intelligence and linguistics. It is a field. [Natural language processing | Wikipedia](https://ja.wikipedia.org/wiki/Natural language processing)

** "In a nutshell" **: Technology that processes the language that humans usually use on a computer

What is morphological analysis?

Morphological analysis is from text data (sentences) in natural language without notes of grammatical information to information such as the grammar of the target language and the part of words of words called dictionaries. Originally, it is the work of dividing into columns of morphemes (Morpheme, roughly speaking, the smallest unit that has meaning in the language), and determining the part of each morpheme. [Morphological analysis | Wikipedia](https://ja.wikipedia.org/wiki/Morphological analysis)

** "In a word" **: A process of dividing a given sentence into the smallest meaningful words and giving part-of-speech information, etc.

What is JUMAN ++

JUMAN ++ is a high-performance morphological analysis system developed by the Kurobashi / Kawahara Laboratory of Kyoto University. By using RNNLM as a language model, analysis is performed considering the semantic naturalness of the word sequence. The basic accuracy does not change, but in addition to the good connection of words, it seems that higher accuracy than MeCab was confirmed in some respects. However, it seems to be slower than others, so if you need real-time performance, you may want to use MeCab.

** "In a nutshell" **: A high-performance morphological analyzer in Japanese, which may be more accurate than MeCab.

Operating environment

Introduction of JUMAN ++

Now let's start introducing JUMAN ++. This time, we will introduce JUMAN ++ to Linux.

For mac users, please refer to here.

These are the two sites I referred to.

First, install two prerequisite packages for using JUMAN ++.

Next, install JUMAN ++ itself.

$ wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/jumanpp/jumanpp-1.01.tar.xz
$ tar xJvf jumanpp-1.01.tar.xz
$ cd jumanpp-1.01
$ ./configure
$ make
$ make install

JUMAN ++ is now installed! By default, it is installed in / usr / local /, so if you want to specify the installation destination, ./configure Add the --prefix = / path option to.

Try immediately.

$ jumanpp
I started studying morphological analysis

Form Keitai Form Noun 6 Appellative 1* 0 * 0 "Representative notation:form/Keitai category:Shape / pattern"
Elementary noun 6 Appellative 1* 0 * 0 "Representative notation:Elementary/So kanji reading:Sound category:Abstract"
Analysis Kaiseki Analysis Noun 6 Sahen Noun 2* 0 * 0 "Representative notation:analysis/Kaiseki category:Abstract domain:Education / learning;Science and technology"
Nono particle 9 Conjunctive particle 3* 0 * 0 NIL
Study Benkyo Study Noun 6 Sahen Noun 2* 0 * 0 "Representative notation:study/Benkyo category:Abstract domain:Education / learning"
To the particle 9 case particle 1* 0 * 0 NIL
Begin Begin Begin Verb 2*0 Vowel verb 1 Basic continuous form 8"Representative notation:start/Beginning Attached verb candidate (basic) Self-transitive verb:Self:Start/Rebellion that begins:verb:Finish/Yeah"
Suffix 14 Verb Suffix 7 Verb Suffix Type 31 Ta Form 7"Representative notation:Masu/Masu"
.. .. .. Special 1 Kuten 1* 0 * 0 NIL
EOS

The JUMAN ++ executable is jumanpp. In my environment, it was in / bin in the installation folder. Morphological analysis was successful with JUMAN ++!

Use JUMAN ++ from Python

Next, we will use JUMAN ++ from Python.

JUMAN ++ is available from Python using PyKNP. When using PyKNP, if JUMAN and KNP are not included in the current environment, you need to install both of them.

I referred to the following site. Use JUMAN ++ from Python

Please use the Reference Site for the above three installation methods.

Finally, let's call JUMAN ++ from Python!

python_jumanpp.py


#-*- encoding: utf-8 -*-
from pyknp import Jumanpp
import sys
import codecs
sys.stdin = codecs.getreader('utf_8')(sys.stdin)
sys.stdout = codecs.getwriter('utf_8')(sys.stdout)
# Use Juman++ in subprocess mode
jumanpp = Jumanpp()
result = jumanpp.analysis(u"I started natural language processing.")
for mrph in result.mrph_list():
	print u"Heading:%s" % (mrph.midasi)
$ python python_jumanpp.py
Heading:Nature
Heading:language
Heading:processing
Heading:start
Heading:Was
Heading:。

You have successfully used JUMAN ++ from Python!

that's all

Recommended Posts

From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Japanese morphological analysis with Python
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
Introduction to Python with Atom (on the way)
From the introduction of pyethapp to the execution of contract
From preparation for morphological analysis with python using polyglot to part-of-speech tagging
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[Introduction to Python] What is the method of repeating with the continue statement?
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
The wall of changing the Django service from Python 2.7 to Python 3
[Python] Morphological analysis with MeCab
Learn Nim with Python (from the beginning of the year).
Python: Japanese text: Morphological analysis
[Introduction to Python] How to iterate with the range function?
[Chapter 5] Introduction to Python with 100 knocks of language processing
Calculate the regression coefficient of simple regression analysis with python
[Chapter 3] Introduction to Python with 100 knocks of language processing
About the handling of ZIP files including Japanese files when upgrading from Python2 to Python3
[Introduction to Python] Basic usage of the library matplotlib
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Introduction to Python] How to get the index of data with a for statement
Introduction to Simple Regression Analysis with Python (Comparison of 6 Libraries of Numerical Calculation/Computer Algebra System)
I tried to find the entropy of the image with python
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
Try to automate the operation of network devices with Python
[Raspi4; Introduction to Sound] Stable recording of sound input with python ♪
[Introduction to Python] How to get data with the listdir function
Get the source of the page to load infinitely with python.
Create folders from '01' to '12' with python
Existence from the viewpoint of Python
Text mining with Python ① Morphological analysis
Introduction to image analysis opencv python
How to scrape stock prices of individual stocks from the Nikkei newspaper website with Python
How to know the number of GPUs from python ~ Notes on using multiprocessing with pytorch ~
Ported from R language of "Sazae-san's rock-paper-scissors data analysis" to Python
IPynb scoring system made with TA of Introduction to Programming (Python)
[Introduction to Python] How to split a character string with the split function
[Python] Try to graph from the image of Ring Fit [OCR]
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
Introduction to Structural Equation Modeling (SEM), Covariance Structure Analysis with Python
[Introduction to Python] I compared the naming conventions of C # and Python.
Introduction to Data Analysis with Python P32-P43 [ch02 3.US Baby Names 1880-2010]
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
I tried to improve the efficiency of daily work with Python
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 2
PhytoMine-I tried to get the genetic information of plants with Python
Learning notes from the beginning of Python 1
I want to extract an arbitrary URL from the character string of the html source with python
Check the existence of the file with python
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
Easy introduction of speech recognition with Python
[Introduction to Python] Let's use foreach with Python
Chapter 1 Introduction to Python Cut out only the good points of deep learning made from scratch
Output the result of morphological analysis with Mecab to a WEB browser compatible with Sakura server / UTF-8
How to enter Japanese with Python curses
Try to calculate the position of the transmitter from the radio wave propagation model with python [Wi-Fi, Beacon]
Python: Simplified morphological analysis with regular expressions
Learning notes from the beginning of Python 2