[PYTHON] IQ Bot Custom Logic: Correction of garbled characters with fixed patterns, blank filling, format conversion, noise exclusion (replacement processing)

I introduced the outline of IQ Bot custom logic in the article here the other day.

Custom logic is a programming language itself called Python, If you don't have programming knowledge, you can't use it.

In this article, the code itself is simple, but it accounts for 80 to 90% of what is actually used in business. I will introduce the code and usage example of the "replacement" process.

As introduced in here The way of thinking is slightly different between field items and table items, so I will introduce each one.

Replacement process for field items

First of all, from the field items that are more basic.

The replacement logic is very simple, but 80 to 90% of the custom logic actually used in business is this.

Basic code

Processing of replacements for field items


field_value = field_value.replace("Character string before replacement","Character string after replacement")

It is possible to define multiple replacement processes for the same item by starting a new line as shown below.

Multiple definitions are OK if you start a new line


field_value = field_value.replace("Character string before replacement 1","Character string after replacement 1")
field_value = field_value.replace("Character string before replacement 2","Character string after replacement 2")

This is just the code, but it can be applied in various ways as follows.

Various uses of replacement processing

--Fill in the blanks

You can fill in the blanks with field_value.replace ("", "") .

As an aside, when IQ Bot reads a value with line breaks, the line breaks are converted to blanks as a specification.

--Convert characters read as "O" to zero (correction of garbled characters with a fixed pattern)

You can convert O to zero with field_value.replace ("O "," 0 ").

In the same way

Replace l (L), I (Ai) and | (Pipeline) with 1 (Ichi), Replace the thorns with Tokyo, Replace the minus sign with ▲ with a minus sign, etc.

You can do various things.

--Format conversion

If you want to change "April 6, 2020" to a format like "April 6, 2020", you can replace it.

Example of date format conversion


field_value = field_value.replace("Year","/") 
field_value = field_value.replace("Month","/") 
field_value = field_value.replace("Day","") 

--Noise exclusion

When processing forms with OCR, stains on paper and marks checked by people with a ballpoint pen appear. Periods, commas, colons, semicolons, slashes, quotation marks, double quotation marks, etc. There are many cases where it is read as noise.

If you don't need such a symbol, you can use field_value.replace ("Noise you want to exclude", "") to exclude noise.

The character string before replacement and the character string after replacement are It's okay if you are surrounded by either double quotation marks (") or single quotation marks (').

If the noise you want to exclude contains double or single quotes, You can define it as follows.

Enclose in quotation marks that are not the noise you want to exclude



#If you want to exclude single quotes, enclose them in double quotes
field_value = field_value.replace("'","") 

#If you want to exclude double quotes, enclose them in single quotes
field_value = field_value.replace('"',"") 

Replacement process for table items

For table items, it's a little different, but it's still easy.

[Here](https://qiita.com/IQBotter/items/67694b1b0d1376ede7e7#%E3%83%86%E3%83%BC%E3%83%96%E3%83%AB%E9%A0%85%E7 % 9B% AE% E3% 81% AE% E3% 82% AB% E3% 82% B9% E3% 82% BF% E3% 83% A0% E3% 83% AD% E3% 82% B8% E3% 83 During the "magic code" explained in% 83% E3% 82% AF% E3% 81% AF% E3% 81% A9% E3% 81% 86% E6% 9B% B8% E3% 81% 8F) ・ ・

Write ↓ like this.

Processing replacements for table items


df['Column name'] = df['Column name'].str.replace('Character string before replacement', 'Character string after replacement')

For the column name, use the column name of here ↓ defined when setting the form with IQ Bot as it is.

Qiita用.jpg

It works normally even with Japanese column names, If the column name contains spaces, replace the spaces with underscores (_).

For example, if you want to exclude the unit "yen" from the "amount" and "amount including tax" columns above, process as follows.

Processing of replacements for field items


df['Amount of money'] = df['Amount of money'].str.replace('Circle', '')
df['Amount of money_tax included'] = df['Amount of money_tax included'].str.replace('Circle', '')

Summary

--The replacement process is easy, but most of the custom logic you use in your business is this. ――Using the replacement processing mechanism, you can do various things such as garbled character correction with a fixed pattern, blank filling, noise exclusion, etc. --For tables, specify the column name --If the table column name contains spaces, replace it with Ansco (_).

How was it?

Proper use of the replacement process can easily improve the quality of your input data and make it easier to automate, so give it a try!

Recommended Posts

IQ Bot Custom Logic: Correction of garbled characters with fixed patterns, blank filling, format conversion, noise exclusion (replacement processing)
IQ Bot Custom Logic: Fixed Value Assignment
IQ Bot Custom Logic Related Processing Summary