Correct half-width and full-width notation fluctuations in Python

Purpose of this article

Create a function to easily correct half-width and full-width notation fluctuations.


Prepare the characters before and after the conversion.

abc_half = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
abc_full = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

digit_half = "0123456789"
digit_full = "0123456789"

katakana_half = "Yayuyo Sashisuseso Sashisuseso Sashisuseso Sashisuseso Sashisuseso Sashisuseso Nani Nino Hahifu Hehomamum Memomo"
katakana_full = "Aiueokakikukekosashisusesotachitsutetonaninunenohahifuhehomamimumeyayuyorarirurerowon"

punc_half = "!\#$%&\()*+,-./:;<=>?@[\\]^_`{|}~"
punc_full = "!#$%&\()*+,-./:;<=>?@[\\]^_`{|}~"

Since the plosive sound of half-width katakana expresses one character with two characters, create a conversion table separately from the others.

tmp01 = "Gagging, Going, Going, Going, Going, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go"
tmp02 = "Gagigugegozajizuzezodajizudedobababibbebopapipupepo"

transtable02 = {}
for i in range(len(tmp02)):
    be = tmp01[i*2:i*2+2]
    af = tmp02[i]
    transtable02[be] = af

In the function clean_text,transtable01 = str.maketrans (before, after)creates a translation table and applies it withtext = text.translate (transtable01).

def clean_text(text):
    text = str(text).replace("\u3000", " ") #Full-width space to half-width

    before = abc_full + digit_full + katakana_half + punc_full
    after = abc_half + digit_half + katakana_full + punc_half

    transtable01 = str.maketrans(before, after)
    text = text.translate(transtable01)
    text = text.translate(transtable02)

    return text

How to use

text = "Memo Nara Rirure,-. / :; qrgegozajizezodaji"

>>>Memo Yayuyora Rirure+,-./:qr Gegozajizuzezodaji

that's all!


I think there are other notational fluctuations in Japanese, such as okurigana and Chinese numerals, so I hope to add more.


[Full-width ⇔ half-width] Recommended library for adjusting Japanese writing fluctuations in Python [python] Create a list of various character types

Recommended Posts

Correct half-width and full-width notation fluctuations in Python
Full-width and half-width processing of CSV data in Python
Class notation in Python
Convert CIDR notation in Python
Stack and Queue in Python
Unittest and CI in Python
Difference between list () and [] in Python
Difference between == and is in python
Included notation in Python function arguments
Manipulate files and folders in Python
Assignments and changes in Python objects
Check and move directories in Python
Ciphertext in Python: IND-CCA2 and RSA-OAEP
Hashing data in R and Python
Function synthesis and application in Python
Export and output files in Python
Reverse Hiragana and Katakana in Python2.7
Reading and writing text in Python
[GUI in Python] PyQt5-Menu and Toolbar-
Create and read messagepacks in Python
Overlapping regular expressions in Python and Java
Display LaTeX notation formulas in Python, matplotlib
Differences in authenticity between Python and JavaScript
Notes using cChardet and python3-chardet in Python 3.3.1.
Modules and packages in Python are "namespaces"
Avoid nested loops in PHP and Python
Differences between Ruby and Python in scope
AM modulation and demodulation in Python Part 2
difference between statements (statements) and expressions (expressions) in Python
Eigenvalues and eigenvectors: Linear algebra in Python <7>
How to write the correct shebang in Perl, Python and Ruby scripts
Implementation module "deque" in queue and Python
Line graphs and scale lines in python
Implement FIR filters in Python and C
Differences in syntax between Python and Java
Check and receive Serial port in Python (Port check)
Search and play YouTube videos in Python
Difference between append and + = in Python list
Difference between nonlocal and global in Python
Write O_SYNC file in C and Python
How to put a half-width space before letters and numbers in Python.
Dealing with "years and months" in Python
Read and write JSON files in Python
Easily graph data in shell and Python
Private methods and fields in python [encryption]
Find and check inverse matrix in Python
Linear Independence and Basis: Linear Algebra in Python <6>
Call sudo in Python and autofill password
Differences in multithreading between Python and Jython
Module import and exception handling in python
How to use is and == in Python
Project Euler # 1 "Multiples of 3 and 5" in Python
[Python] A function that aligns the width by inserting a space in text that has both full-width and half-width characters.
Organize python modules and packages in a mess
Accurately correct Android clock with adb and python
How to generate permutations in Python and C ++
Python variables and data types learned in chemoinformatics
Receive and display HTML form data in Python
Prime number enumeration and primality test in Python
[Python] Swapping rows and columns in Numpy data
[python] Difference between variables and self. Variables in class