[PYTHON] Power Automate Desktop Half-width and full-width conversion of katakana and alphanumeric characters

Overview

When automating from data collection to input, correcting half-width and full-width notation fluctuations is one of the issues. Among them, half-width to full-width conversion of katakana alphanumeric characters, which is considered difficult to correct with regular expressions, is performed.

I exchanged data with python using the "clipboard action" of Power Automate Desktop (PAD), and thought about how to correct half-width and full-width notation using the library. ezgif.com-gif-maker (8).gif In the demo, the half-width and full-width notation of the test data below is converting the address-like data that is messed up to full-width.

Prerequisites

Windows10pro 20H2 Power Automate Desktop 2.2.20339.22608 .py file can be executed

Python 3.8.5 pandas 1.1.4 mojimoji 0.0.11

As a Python library, we use a library "mojimoji" that can convert Japanese character strings into half-width and full-width characters at high speed and "Pandas".

Python uses the https://www.python.org/ installer instead of the distribution. The libraries are installed separately. Information for January 2021.

Flow image

image.png Data is exchanged using the clipboard action of PAD and the function that can input and output Pandas data frames with the clipboard.

  1. Copy the data of the column you want to convert to the clipboard
  2. Read from clipboard to Pandas data frame
  3. Half-width to full-width conversion processing with mojimoji
  4. Store in clipboard
  5. Paste into Excel

Preparation

Prepare the test data.

number Full name Street address
1 Yamada Ai Minato-ku, Tokyo Ah 1-2 X Building 3F
2 Eo Tanaka 3 Ii-ku, Saitama City, Saitama Prefecture-4 Wybil 1st floor
3 Kakiku Nakamura 102 Set House, 6-3, Uu-ku, Chiba-shi, Chiba
4 Keiko Sato 306, 3-3-3, Ee-cho, Maebashi-shi, Gunma
5 Suzuki Sashisu 2-3, Oomachi, Utsunomiya City, Tochigi Prefecture-4 AAA Building 204C

Paste the above table data into empty Excel and save it as dummytest.xlsx on your desktop.

You need to install mojimoji and pandas in a python runtime environment that is not venv.

% py -m pip install pandas
% py -m pip install mojimoji

If the clipboard history is turned on, the PAD "Clear Clipboard Contents" action doesn't seem to work, so leave it off. Settings> System> Clipboard.

スクリーンショット 2021-01-11 130753.jpg

Flow creation

  1. Get a special folder

  2. Clear the contents of the clipboard

  3. Excel startup path is `` `% SpecialFolderPath%/dummytest.xlsx``` image.png The test data prepared in preparation is specified.

  4. Get the first empty row in a column from an Excel worksheet image.png

  5. Read from Excel Worksheet image.png

  6. Set clipboard text image.png

  7. Write the text to a file image.png I'm writing a Python script here, but the current PAD specification doesn't allow multi-line text to fill in "text to write". Article I wrote before, but I would appreciate it if you could refer to it.

henkan.py


import mojimoji
import pandas as pd
df = pd.read_clipboard()
def zenkaku(x):
    return mojimoji.han_to_zen(x)
df["Full-width conversion"] = df.Street address.map(zenkaku)
df.Full-width conversion.to_clipboard(index = None)
  1. Execution of DOS command image.png Execute the python script and store the data frame (table) that has undergone half-width and full-width conversion processing in the clipboard.

  2. Paste the cell into an Excel worksheet image.png The action name is "Paste cell in Excel worksheet", but you can paste the contents of the clipboard into the cell.

  3. Delete files image.png Delete the python script.

  4. Clear the contents of the clipboard

Overall flow

image.png Only this (;'∀')

PowerAutomateDesktopRobin


Folder.GetSpecialFolder SpecialFolder: Folder.SpecialFolder.DesktopDirectory SpecialFolderPath=> SpecialFolderPath
Clipboard.Clear _
Excel.LaunchAndOpen Path: $'''%SpecialFolderPath%/dummytest.xlsx''' Visible: True ReadOnly: False LoadAddInsAndMacros: False Instance=> ExcelInstance
Excel.Advanced.GetFirstFreeRowOnColumn Instance: ExcelInstance Column: 3 FirstFreeRowOnColumn=> FirstFreeRowOnColumn
Excel.ReadCells Instance: ExcelInstance StartColumn: 3 StartRow: 1 EndColumn: 3 EndRow: FirstFreeRowOnColumn - 1 ReadAsText: False FirstLineIsHeader: False RangeValue=> ExcelData
Clipboard.SetText Text: ExcelData
File.WriteText File: $'''%SpecialFolderPath%\\henkan.py''' TextToWrite: $'''import mojimoji
import pandas as pd
df = pd.read_clipboard()
def zenkaku(x):
    return mojimoji.han_to_zen(x)
df[\"Full-width conversion\"] = df.Street address.map(zenkaku)
df.Full-width conversion.to_clipboard(index = None)''' AppendNewLine: True IfFileExists: File.IfFileExists.Overwrite Encoding: File.FileEncoding.UTF8
System.RunDOSCommand DOSCommandOrApplication: $'''%SpecialFolderPath%\\henkan.py''' WorkingDirectory: $'''C:\\Users\\Aphrodite\\Desktop''' StandardOutput=> CommandOutput StandardError=> CommandErrorOutput ExitCode=> CommandExitCode
Excel.Advanced.PasteAt Instance: ExcelInstance Column: 4 Row: 1
File.Delete Files: $'''%SpecialFolderPath%\\henkan.py'''
Clipboard.Clear _

Summary

Data can be transmitted using the clipboard between PAD and Pandas. One of the challenges (for me) has been solved thanks to the great library of Python. There are many other great libraries, so it seems that python integration can be done in various ways. For that reason, I would like you to modify the text action so that it can use multiple lines in the usual way. (Winautomation is possible) A similar method would be possible for RPA tools that have clipboard-related actions.

If you want to convert manually, you can also use the JIS function of Excel. I chose this method because I don't want to use a format that uses a lot of functions as much as possible, and I thought about realizing it only within PAD using regular expressions, but it didn't work.

I think it's best to be forced at the input stage. (;'∀')

reference

Perform half-width / full-width conversion at high speed with Python

Postscript

I also tried unicodedata.normalize from Python2 implemented in PAD, but I gave up because the number of characters is strange. Reference Character code hell secret story Episode 3: Unicode normalization with no backtracking image.png

Recommended Posts

Power Automate Desktop Half-width and full-width conversion of katakana and alphanumeric characters
Power Automate Desktop Half-width and full-width conversion of katakana and alphanumeric characters