Call Python library for text normalization from MATLAB

Introduction

There are cases where I want to use an existing text analysis function written in another language for text analysis, so I tried it. Let's call a Python-based document initialization tool called neologdn from MATLAB. I'm new to Python, so I'm sorry if I make a lot of mistakes.

environment

MATLAB R2020a Python 3.6

procedure

There is an official page called "Calling Python Library Functions", so prepare by referring to this. Both MATLAB and Python environments are required, but even if you say Python in a word, there is one that supports calling from MATLAB, and that seems to be easier, so as per the official page I installed it.

Enter the following on the MATLAB side as a trial.

MATLAB


py.os.listdir('.')

Then, I was able to display the list of files using os.listdir on the Python side.

Next, prepare to use neologdn, a tool that normalizes Japanese.

neologdn is a Japanese text normalizer for mecab-neologd. The normalization is based on the neologd's rules: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

Install neologd.

command prompt


py -m pip install neologdn

You are now ready.

Let's run the example sentence in the neologd readme in MATLAB.

MATLAB


>> py.neologdn.normalize("Hankaku Kana")

ans = 

Python str has no properties.

Handkerchief

>> py.neologdn.normalize("Double-byte symbol! ?? @ #")

ans = 

Python str has no properties.

Double-byte symbol!?@#

>> py.neologdn.normalize("Double-byte symbol exception "・"")

ans = 

Python str has no properties.

Double-byte symbol exception "・"

>> py.neologdn.normalize("Long vowel shortening way")

ans = 

Python str has no properties.

Long vowel shortening way

>> py.neologdn.normalize("Tilde Delete We~~ ∾ ~ 〰 ~ i")

ans = 

Python str has no properties.

Tilde removal way

>> py.neologdn.normalize("Various hyphens ˗֊ ------ – ⁃⁻₋−")

ans = 

Python str has no properties.

Various hyphens-

>> py.neologdn.normalize("PRML supplementary reading book")

ans = 

Python str has no properties.

PRML supplementary reader

>> py.neologdn.normalize(" Natural Language Processing ")

ans = 

Python str has no properties.

    Natural Language Processing

>> py.neologdn.normalize("Cute good good good", pyargs('repeat',6))

ans = 

Python str has no properties.

Cute good good

>> py.neologdn.normalize("Waste Waste Waste Waste", pyargs('repeat',1))

ans = 

Python str has no properties.

Waste

>> 

You can process it according to the readme. By the way, the result seems to be returned in str type. image.png

Before dividing it into tokens with Text Analytics Toolbox, it would be convenient to be able to normalize it like this.

Recommended Posts

Call Python library for text normalization from MATLAB
Call Matlab from Python to optimize
Call Polly from the AWS SDK for Python
I wanted to use the Python library from MATLAB
Use IvyFEM (Finite Element Method Library for .NET) from Python
Call a Python function from p5.js.
Call C from Python with DragonFFI
3.6 Text Normalization 3.7 Regular Expressions for Tokenizing Text
Extract text from images in Python
<For beginners> python library <For machine learning>
Tips for calling Python from C
Call python from nim with Nimpy
Call C / C ++ from Python on Mac
Call c language from python (python.h)
Call your own C language shared library from Python using ctypes
[google-oauth] [python] Google APIs Client Library for Python
"Python AI programming" starting from 0 for windows
Note for Pyjulia calling Julia from Python
Python> Output numbers from 1 to 100, 501 to 600> For csv
Call a command from Python (Windows version)
Extract text from PowerPoint with Python! (Compatible with tables)
Try operating Studio Library from Python. [Anim Save]
Wrap C with Cython for use from Python
~ Tips for Python beginners from Pythonista with love ① ~
Call a Python script from Embedded Python in C ++ / C ++
Wrap C ++ with Cython for use from Python
Wav file generation from numeric text with python
~ Tips for Python beginners from Pythonista with love ② ~
An easy way to call Java from Python
Registering with PyPI from modern Python library self-made
2016-10-30 else for Python3> for:
Python 3.6 email library
python [for myself]
Python ast library
sql from python
Install PyCall on Raspberry PI and try using GPIO's library for Python from Ruby
MeCab from Python
Python Library notes
Install psycopg2 (pgsql library for python3) on Apple Silicon
Python: Japanese text: Characteristic of utterance from word similarity
Python text reading for multiple lines and one line
Translator in Python from Visual Studio 2017 (Microsoft Translator Text API)
[Python] No value for argument'self' in unbound method call
[Python] Get the text of the law from the e-GOV Law API
Let's call your own C ++ library with Python (Preferences)
Python learning memo for machine learning by Chainer from Chapter 2
Call your own python module from the ROS package
Python: Japanese text: Characteristic of utterance from word continuity
[Python] How to call a c function from python (ctypes)