Comparison of Japanese conversion module in Python3

Ciao ... †

Natural language processing is accompanied by preprocessing. Pre-processing has never been fast. So, I compared the Japanese conversion module in Python3.

Comparison items

Full-width / half-width conversion and conversion from hiragana to katakana. See both long and short target strings.

Comparison

-jaconv (A module I made. Recently renamed from jctconv) -Mohayonao's code

Comparison result

Click here for details

jaconv cnvk mojimoji zenhan rfZenHan mohayonao nkf
Short sentences from half-width to full-width 27.1 µs 96.4 µs 5.04 µs 75.8 µs 222 µs 23 µs
Long sentences half-width → full-width 89.9 ms 38.6 ms 23.1 ms 360 ms 237 ms 95.4 ms
Short sentences in hiragana → katakana 18.1 µs 79.1 µs 25.4 µs 23.2 µs
Long sentences in hiragana → katakana 51.6 ms 41.8 ms 246 ms 98.6 ms

As I use Cython, mojimoji is fast. In Pure Python, jaconv has good performance in short sentences, and cnvk seems to be good in long sentences.

Recommended Posts

Comparison of Japanese conversion module in Python3
Python executable file conversion module comparison 2
Conversion of string <-> date (date, datetime) in Python
Japanese output in Python
I wrote python in Japanese
Null object comparison in Python
Automatic update of Python module
Store Japanese (multibyte character string) in sqlite3 of python
[python] Get the list of classes defined in the module
Equivalence of objects in Python
Comparison of 4 Python web frameworks
I understand Python in Japanese!
Implementation of quicksort in Python
Sample of getting module name and class name in Python
Get Japanese synonyms in Python
Comparison of exponential moving average (EMA) code written in Python
Comparison of how to use higher-order functions in Python 2 and 3
Solve the Japanese problem when using the CSV module in Python.
Comparison of data frame handling in Python (pandas), R, Pig
Pixel manipulation of images in Python
Division of timedelta in Python 2.7 series
MySQL-automatic escape of parameters in python
Handling of JSON files in Python
Implementation of life game in Python
Waveform display of audio in Python
Python unittest module execution in vs2017
Law of large numbers in python
Implementation of original sorting in Python
Speed comparison of Python XML parsing
Reversible scrambling of integers in Python
How to handle Japanese in Python
Master the weakref module in Python
Check the behavior of destructor in Python
(Bad) practice of using this in Python
General Theory of Relativity in Python: Introduction
Output tree structure of files in Python
(Java, JavaScript, Python) Comparison of string processing
Pass the path of the imported python module
Display a list of alphabets in Python 3
Implementation module "deque" in queue and Python
Make a relation diagram of Python module
Summary of various for statements in Python
python string comparison / use'list'and'in' instead of'==' and'or'
Playing card class in Python (with comparison)
Test of uniqueness in paired comparison method
The result of installing python in Anaconda
Comparison of solutions in weight matching problems
Gang of Four (GoF) Patterns in Python
Check the path of the Python imported module
Module to generate word N-gram in Python
R: Use Japanese instead of Japanese in scripts
The basics of running NoxPlayer in Python
Bulk replacement of strings in Python arrays
Project Euler # 16 "Sum of Powers" in Python
Traffic Safety-kun: Recognition of traffic signs in Python
Conversion of time data in 25 o'clock notation
ModuleNotFoundError in Python: No module named story
Summary of built-in methods in Python list
Non-logical operator usage of or in python
In search of the fastest FizzBuzz in Python
Python: Preprocessing in machine learning: Data conversion