From preparation for morphological analysis with python using polyglot to part-of-speech tagging

Preparation

Use polyglot (Document).

The following has been confirmed to work with Python 3.8.5. First,

pip install numpy
pip install polyglot
pip install six
pip install pycld2
pip install morfessor
pip install pyicu

Install in the order of. However, when ModuleNotFoundError tells you to put icu,

pip install icu

not

pip install pyicu

Let. If you try to install and use icu, you should get the error cannot import name xxx. Note that it is a different item.

If that doesn't work, see Error installing pip pyicu.

Analyze

Look at the official Part of Speech Tagging and look up the part of speech.

from polyglot.text import Text

blob = "You never fail until you stop trying."
tokens = Text(blob)
print(tokens.pos_tags)

This should give you the part of speech of every word in the sentence, but you should get an error.

ValueError: This resource is available in the index but not downloaded, yet. Try to run

polyglot download embeddings2.en

so

git clone https://github.com/web64/nlpserver.git

After that, on the 14th line of nlpserver.py

app.config['JSON_AS_ASCII'] = False

After adding

polyglot download embeddings2.en
polyglot download pos2.en

Is inserted. This part was written in Not able to pull polyglot files.

Now that you can analyze English, the previous code works,

from polyglot.text import Text

blob = "You never fail until you stop trying."
tokens = Text(blob)
print(tokens.pos_tags)

As a result of

[('You', 'PRON'), ('never', 'ADV'), ('fail', 'VERB'), ('until', 'SCONJ'), ('you', 'PRON'), ('stop', 'VERB'), ('trying', 'VERB'), ('.', 'PUNCT')]

Is obtained. The result is hard to see in one line, so use pprint on the last line

import pprint
pprint.pprint(tokens.pos_tags)

By

[('You', 'PRON'),
 ('never', 'ADV'),
 ('fail', 'VERB'),
 ('until', 'SCONJ'),
 ('you', 'PRON'),
 ('stop', 'VERB'),
 ('trying', 'VERB'),
 ('.', 'PUNCT')]

You may devise such as. The names of the part of speech are as follows. The abbreviation and description (English) are taken from Part of Speech Tagging.

Abbreviated name Explanation(English) Explanation(Japanese)
ADJ adjective adjective
ADP adposition Preposition
ADV adverb adverb
AUX auxiliary verb Auxiliary verb
CONJ coordinating conjunction Coordinate conjunction
DET determiner Determiner
INTJ interjection interjection
NOUN noun noun
NUM numeral numeral
PART particle Particles
PRON pronoun Pronoun
PROPN proper noun Proper noun
PUNCT punctuation Punctuation
SCONJ subordinating conjunction Subordinate connection
SYM symbol symbol
VERB verb verb
X other others

reference

Installation reference https://qiita.com/sawada/items/528da0b22546045122b2

Reference about the features of polyglot http://lab.astamuse.co.jp/entry/try-polyglot

Recommended Posts

From preparation for morphological analysis with python using polyglot to part-of-speech tagging
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Collecting information from Twitter with Python (morphological analysis with MeCab)
Principal component analysis using python from nim with nimpy
Japanese morphological analysis with Python
A real way for people using python 3.8.0-2 from windows to work with multibyte characters
[For beginners] Language analysis using the natural language processing tool "GiNZA" (from morphological analysis to vectorization)
Convert from Pandas DataFrame to System.Data.DataTable using Python for .NET
Introduction to Python for VBA users-Calling Python from Excel with xlwings-
Morphological analysis using Igo + mecab-ipadic-neologd in Python (with Ruby bonus)
Using Rstan from Python with PypeR
Create folders from '01' to '12' with python
Text mining with Python ① Morphological analysis
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
From Python to using MeCab (and CaboCha)
Memo to ask for KPI with python
Tips for using python + caffe with TSUBAME
Python> Output numbers from 1 to 100, 501 to 600> For csv
Python: Simplified morphological analysis with regular expressions
Preparation for scraping with python [Chocolate flavor]
How to deal with OAuth2 error when using Google APIs from Python
Create a tool to automatically furigana with html using Mecab from Python3
[Let's play with Python] Aiming for automatic sentence generation ~ Perform morphological analysis ~
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
Try to beautify with Talking Head Anime from a Single Image [python preparation]
Notes from installing Homebrew to building an Anaconda environment for Python with pyenv
I want to email from Gmail using Python.
Wrap C with Cython for use from Python
~ Tips for Python beginners from Pythonista with love ① ~
[Python] Flow from web scraping to data analysis
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 2: Import data to SQL Server using PowerShell
Wrap C ++ with Cython for use from Python
From Python environment construction to virtual environment construction with anaconda
~ Tips for Python beginners from Pythonista with love ② ~
The first artificial intelligence. I wanted to try natural language processing, so I will try morphological analysis using MeCab with python3.
[Updated from time to time] Python memos often used for data analysis [N division, etc.]
Text mining with Python ① Morphological analysis (re: Linux version)
Data analysis for improving POG 1 ~ Web scraping with Python ~
[For beginners] How to study Python3 data analysis exam
How to scrape image data from flickr with python
Push notifications from Python to Android using Google's API
Reading Note: An Introduction to Data Analysis with Python
Easy way to scrape with python using Google Colab
MessagePack-Call Python (or Python to Ruby) methods from Ruby using RPC
3. Natural language processing with Python 4-1. Analysis for words with KWIC
From buying a computer to running a program with python
For those who want to write Python with vim
Copy S3 files from Python to GCS using GSUtil
[Introduction to Python] How to write repetitive statements using for statements
Query from python to Amazon Athena (using named profile)
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
ODBC access to SQL Server from Linux with Python
Data analysis with python 2
Scraping with Python (preparation)
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Data analysis using Python 0
Voice analysis with python
Voice analysis with python
Data analysis with Python
I know? Data analysis using Python or things you want to use when you want with numpy