Full-width and half-width processing of CSV data in Python

How to unify data with katakana, symbols, alphabets and numbers mixed with full-width and half-width characters.

reference:

https://qiita.com/shakechi/items/d12641d6cad01479785f

Since it is troublesome, when CSV is opened with pandas, I made it a function so that full-width and half-width processing can be performed for each column. Just put the column name in the list of columns = [] and it's OK.

What to process: Make katakana, symbols (spaces, etc.) and numbers half-width.


#Pre-install pip install jaconv with a terminal or command line tool.
import jaconv


def shori(column):
    list=  df[column].values.tolist()
    new_list = []
    
    for li in list:
        li = jaconv.z2h(li,digit=True, ascii=True,kana=True)
        new_list.append(li)
        
    df[column] = new_list
    
    return df[column]

##Put the column name you want to process in the list.
columns = []

#Turn with for.
for column in columns:
    shori(column)

Recommended Posts

Full-width and half-width processing of CSV data in Python
Correct half-width and full-width notation fluctuations in Python
Summary of date processing in Python (datetime and dateutil)
Hashing data in R and Python
processing to use notMNIST data in Python (and tried to classify it)
Data input / output in Python (CSV, JSON)
Easily graph data in shell and Python
Separation of design and data in matplotlib
Csv in python
Status of each Python processing system in 2020
Project Euler # 1 "Multiples of 3 and 5" in Python
Data analysis: Easily apply descriptive and inference statistics to CSV data in Python
I have 0 years of programming experience and challenge data processing with python
Plot CSV of time series data with unixtime value in Python (matplotlib)
Python: Preprocessing in machine learning: Handling of missing, outlier, and imbalanced data
Python variables and data types learned in chemoinformatics
Receive and display HTML form data in Python
View the result of geometry processing in Python
[Python] Swapping rows and columns in Numpy data
Real-time visualization of thermography AMG8833 data in Python
Reading and writing CSV and JSON files in Python
The story of reading HSPICE data in Python
Y / n processing in bash, python and Go
A well-prepared record of data analysis in Python
Explanation of edit distance and implementation in Python
Speed evaluation of CSV file output in Python
Example of reading and writing CSV with Python
File processing in Python
Multithreaded processing in python
Text processing in Python
Queue processing in Python
Various processing of Python
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
plot the coordinates of the processing (python) list and specify the number of times in draw ()
[Python] How to name table data and output it in csv (to_csv method)
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
Summary of tools needed to analyze data in Python
Calculation of standard deviation and correlation coefficient in Python
Power BI visualization of Salesforce data entirely in Python
List of Python libraries for data scientists and data engineers
Collectively register data in Firestore using csv file in Python
Difference between Ruby and Python in terms of variables
[python] Calculation of months and years of difference in datetime
Performance verification of data preprocessing in natural language processing
Not being aware of the contents of the data in python
List of Python code used in big data analysis
Let's use the open data of "Mamebus" in Python
Python asynchronous processing ~ Full understanding of async and await ~
Process csv data with python (count processing using pandas)
I made a program in Python that reads CSV data of FX and creates a large amount of chart images
Overview of generalized linear models and implementation in Python
Sample of getting module name and class name in Python
Overview of natural language processing and its data preprocessing
Compare read / write speed and capacity of csv, pickle, joblib, parquet in python environment
Consolidate a large number of CSV files in folders with python (data without header)
Check the processing time and the number of calls for each process in python (cProfile)
Handle Ambient data in Python
UTF8 text processing in python
Until you get daily data for multiple years of Japanese stocks and save it in a single CSV (Python)
Display UTM-30LX data in Python
Asynchronous processing (threading) in python