[PYTHON] Run Polyglot on Raspberry Pi to perform morphological analysis in English

When I made the material for the Advent calendar, I needed to perform morphological analysis in English, but I got stuck a little, so I will leave it.

What is Polygot

In the Japanese domain, MeCab is often taken up as a morphological analysis tool, but there are not so many cases when it comes to morphological analysis in English. Polygot is a library that has many functions of natural language analysis such as language identification and language detection, including morphological analysis of English sentences.

Language support scope

It is as described in Official Document, but the supported languages differ depending on the function.

Function name Number of supported languages Description
Tokenization 165 languages Divides a character string into the smallest unit of sentences to be handled when performing natural language processing
Language detection 196 languages Identify the language of the string to be parsed
Named entity recognition 40 languages Extracts named entity from the string to be parsed with PolygotplaceOrganizationManYou can extract three types of
Part-of-speech tagging 16 languages Part of speech tag is added to each token of the character string to be parsed.
Sentiment analysis 136 languages NegativeneutralpositiveYou can get 3 types of
Distributed representation 137 languages Map words to a d-dimensional vector space
Morphological analysis 135 languages Divide the character string to be parsed into the smallest meaningful units
Transliteration 69 languages Converts the input string to a string in the specified language

As you can see from the table above, it supports many languages.

Install Let's set up Polygot to actually work.

Install Polygot

$ sudo pip3 install -U polyglot

polyglot itself can be installed by simply executing the above command. However, in order to actually perform language analysis with polyglot, it is necessary to obtain a dictionary of the language to be analyzed. If ICU is not installed when retrieving the dictionary, an error will be thrown. So, before downloading, execute the following command to get the required library.

$ sudo apt-get -y install libicu-dev
$ sudo pip3 install -U pyicu
$ sudo pip3 install -U morfessor

In addition, ** pycld2 ** is required to download the model. In a normal Linux environment, you can install by just hitting $ sudo pip install pycld2. However, when I execute the above command on the Raspberry Pi, the following error is displayed.

arm-linux-gnueabihf-gcc: error: unrecognized command line option ‘-m64’
  error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for pycld2

The above error occurs because the compiler for the ARM architecture does not provide the -m64 option and the compilation fails. As it is, pycld2 cannot be installed, so Polyglot cannot be run on Raspberry Pi. I'm in trouble ...

Install pycld2 on Raspberry Pi

Since it cannot be installed as it is, it is necessary to execute setup.py after removing the -m64 compile option specified in setup.py of pycld2. After git clone from the repository below, play with setup.py. aboSamoor/pycld2 - Github

$ git clone https://github.com/aboSamoor/pycld2.git
$ cd pycld2/

Move to the directory of pycld2 that was git cloned, delete **-m64 ** from the array of compile options described in Line 78 of setup.py located directly under it, and then save it.

Change before


    language="c++",
    # TODO: -m64 may break 32 bit builds
    extra_compile_args=["-w", "-O2", "-m64", "-fPIC"],

After change


    language="c++",
    # TODO: -m64 may break 32 bit builds
    extra_compile_args=["-w", "-O2", "-fPIC"],

After making the changes, execute the following command.

$ sudo pip3 install hogehoge/pycld2/
Successfully built pycld2
Installing collected packages: pycld2
Successfully installed pycld2-0.42

After execution, if Successfully is displayed, the installation is successful.

Download the model

You can download the model that executes the following command. This time we will perform morphological analysis of English sentences, so download the English model.

$ polyglot download morph2.en
[polyglot_data] Downloading package morph2.en to
[polyglot_data]     /home/pi/polyglot_data...

Actually perform morphological analysis

All you have to do now is run the sample code below.

morph.py


from polyglot.text import Text

sample_text = "One Hamburger and a Medium Coffee please."
tokens = Text(sample_text)
print(tokens.morphemes)

When you actually execute the above script, you can get the result in the following form.

$ python3 morph.py 
['One', ' ', 'Ham', 'burg', 'er and a Medium Coffee p', 'lease', '.']

in conclusion

This time, I used Polyglot for the first time to create a certain program. Since language can be determined, if it is Japanese in connection with Twitter API, it can be processed on the MeCab side and the rest can be left to Polyglot. I don't think that English natural language processing will be used in business, but I will leave it as a memorandum as a drawer.

Recommended Posts

Run Polyglot on Raspberry Pi to perform morphological analysis in English
How to install NumPy on Raspberry Pi
Connect to MySQL with Python on Raspberry Pi
Wall to put OpenCV 3.1.0 in raspberry pi 3 and run sample on python3: ImportError: No module named cv2 solution
Run LEDmatrix interactively with Raspberry Pi 3B + on Slackbot
pigpio on Raspberry pi
Output to "7-segment LED" using python on Raspberry Pi 3!
Cython on Raspberry Pi
A story about trying to use cron on a Raspberry Pi and getting stuck in space
Run AWS IoT Device SDK for Python on Raspberry Pi
I want to disable interrupts on Raspberry Pi (≒ DI / EI)
Change the message displayed when logging in to Raspberry Pi
Introduced python3-OpenCV3 to Raspberry Pi
I talked to Raspberry Pi
Introducing PyMySQL to raspberry pi3
Introduced pyenv on Raspberry Pi
Use NeoPixel on Raspberry Pi
Install OpenCV4 on Raspberry Pi 3
Install TensorFlow 1.15.0 on Raspberry Pi
Log in to Raspberry PI with ssh without password (key authentication)
How to install OpenCV on Cloud9 and run it in Python
Resolved an error when putting pygame in python3 on raspberry pi
Perform morphological analysis in the machine learning environment launched by GCE
I want to run the Python GUI when starting Raspberry Pi
Use python on Raspberry Pi 3 to illuminate the LED (Hello World)
How to play music (wav / mp3) files on Raspberry Pi python
Port FreeRTOS to Raspberry Pi 4B
Testing uart communication on Raspberry Pi
MQTT on Raspberry Pi and Mac
Preparing to run Flask on EC2
Install ghoto2 on Raspberry Pi (memo)
Output from Raspberry Pi to Line
How to run matplotlib on heroku
Try using ArUco on Raspberry Pi
OpenCV installation procedure on Raspberry Pi
Power on / off Raspberry pi on Arduino
How to run TensorFlow 1.0 code in 2.0
Detect switch status on Raspberry Pi 3
Install OpenMedia Vault 5 on Raspberry Pi 4
L Chika on Raspberry Pi C #
[Raspberry Pi] Changed Python default to Python3
Build wxPython on Ubuntu 20.04 on raspberry pi 4
Create a shortcut to run a Python file in VScode on your terminal
From preparation for morphological analysis with python using polyglot to part-of-speech tagging