Perform half-width / full-width conversion at high speed with Python

Introducing the library "mojimoji" that can convert Japanese character strings into half-width and full-width characters at high speed with Python.

mojimoji can perform full-width / half-width conversion in Python at high speed. Using the method introduced in here, the conversion process is internally performed using unordered_map of Cython and C ++, which is considerably faster than the conventional implementation. It works.

Installation

% pip install mojimoji

Overview

mojimoji has two methods, han_to_zen and zen_to_han. You can also take the keyword arguments kana, digit, and ascii, respectively, to disable katakana, number, and alphabet conversion.

Conversion from full-width to half-width

>>> import mojimoji
>>> print mojimoji.zen_to_han(u'Aiu abc012')
Aiuu abc012
>>> print mojimoji.zen_to_han(u'Aiu abc012', kana=False)
Aiu abc012
>>> print mojimoji.zen_to_han(u'Aiu abc012', digit=False)
Aiuu abc012
>>> print mojimoji.zen_to_han(u'Aiu abc012', ascii=False)
Iwabc012

Conversion from half-width to full-width

>>> import mojimoji
>>> print mojimoji.han_to_zen(u'Aiuu abc012')
Aiu abc012
>>> print mojimoji.han_to_zen(u'Aiuu abc012', kana=False)
Iwabc012
>>> print mojimoji.han_to_zen(u'Aiuu abc012', digit=False)
Aiu abc012
>>> print mojimoji.han_to_zen(u'Aiuu abc012', ascii=False)
Aiu abc012

performance

Similarly, let's compare the operation speed with the libraries zenhan and jctconv that convert half-width and full-width with Python.

% pip install zenhan
% pip install jctconv
% ipython
In [1]: import mojimoji
In [2]: import zenhan
In [3]: import jctconv
In [4]: s = u'Io Eo 012345' * 10
In [5]: %time for n in range(1000000): mojimoji.zen_to_han(s)
CPU times: user 3.90 s, sys: 0.03 s, total: 3.93 s
Wall time: 3.97 s
In [6]: %time for n in range(1000000): zenhan.z2h(s)
CPU times: user 71.05 s, sys: 0.16 s, total: 71.22 s
Wall time: 71.45 s
In [7]: %time for n in range(1000000): jctconv.z2h(s)
CPU times: user 19.75 s, sys: 0.06 s, total: 19.81 s
Wall time: 19.86 s

You can see that it is about 18 times faster than the zenhan library implemented in Pure Python and about 5 times faster than jctconv.

download

Recommended Posts

Perform half-width / full-width conversion at high speed with Python
Measure WiFi speed with Python
Perform implied volatility calculation at high speed (market data processing)
How to scrape at speed per second with Python Selenium
[Python] Find Fibonacci numbers at high speed (memoization, dynamic programming)
[Python] Articles that enable sparse matrix calculations at high speed
[Python] How to get divisors of natural numbers at high speed
MP3 to WAV conversion with Python
Roughly speed up Python with numba
Implement drawing modes such as PhotoShop at high speed with PIL / Pillow
Convert memo at once with Python 2to3
Algorithm learned with Python 7th: Year conversion
Periodically perform arbitrary processing with Python Twisted
Algorithm learned with Python 3rd: Radix conversion
How to create large files at high speed
Python template for log analysis at explosive speed
[Python3] A story stuck with time zone conversion
Latitude / longitude coordinates ↔ UTM coordinate conversion with python
Correct half-width and full-width notation fluctuations in Python
Explosive speed with Python (Bottle)! Web API development
Convert multiple proto files at once with python
[Python] ABC159D (High School Mathematics nCr) [At Coder]
Tutorial to infer the model learned in Tensorflow with C ++/OpenVINO at high speed