[PYTHON] IQ Bot Custom Logic: Correcting common reading habits on dates

There is a relatively common OCR reading habit pattern when reading dates with OCR.

The "day" part of "YYYY MM month DD day" is The pattern that is "B" (alphabet bee) and It is a pattern that is "0" (zero).

I haven't seen many cases where "year" and "month" are read differently. For some reason, only "day" is B or 0 in various patterns of forms.

IQ Bot can correct such OCR reading habits and make them beautiful, so here's how to do it.

Correspondence to the pattern that reads the day as B

In this case, the simple replacement process introduced in here can be used.

Correspondence to the pattern that the date of the date item is read as B (in the case of field item)


field_value = field_value.replace("B","Day")

It is unlikely that "B" is included in the correct answer data in the date item, so it would be okay to simply replace it.

For table items, click here (https://qiita.com/IQBotter/items/b1e7a75439fede2171e6#%E3%83%86%E3%83%BC%E3%83%96%E3%83%AB%E9 % A0% 85% E7% 9B% AE% E3% 81% AB% E5% AF% BE% E3% 81% 99% E3% 82% 8B% E7% BD% AE% E6% 8F% 9B% E5% 87 See% A6% E7% 90% 86).

Correspondence to the pattern that reads the day as 0 (zero)

In this case, it cannot be simply replaced. This is because 0 may be the correct answer, such as "10th", "20th", "30th", etc.

Coping method (Cheat sheet: Field edition)

If you say the answer first, the following code can be used to solve the problem of reading "day" as zero.

Correspondence to the pattern that reads the date of the date item as zero (in the case of field item)


if(field_value[-1:]=="0"):
    field_value = field_value[:-1]
    field_value = field_value + "Day"

Coping method (cheat sheet: table edition)

For tables, the magic code ([here](https://qiita.com/IQBotter/items/67694b1b0d1376ede7e7#%E3%83%86%E3%83%BC%E3%83%96%E3%83%AB] % E9% A0% 85% E7% 9B% AE% E3% 81% AE% E3% 82% AB% E3% 82% B9% E3% 82% BF% E3% 83% A0% E3% 83% AD% E3 % 82% B8% E3% 83% 83% E3% 82% AF% E3% 81% AF% E3% 81% A9% E3% 81% 86% E6% 9B% B8% E3% 81% 8F) In the meantime, you can handle it by writing the following code.

Correspondence to the pattern that reads the day of the date item as zero (in the case of table item)



#A function that replaces a day if the end of the date is zero
def dayreplace(ymd):
	x = str(ymd)
	if(x[-1:] == "0"):
		x = x[:-1]
		x = x + "Day"
	return x

#Table string replacement
df['Column name for which you want to correct the date'] = df['Column name for which you want to correct the date'].apply(dayreplace)

Explanation of the mechanism

There are three points that are common to the field and table editions.

(1) if statement (conditional branch) (2) slice (3) + operator for concatenating character strings

In addition to the above, the table edition uses a mechanism called a function.

I will link each easy-to-understand explanation.

-① if statement (= conditional branch) The above code is the condition that determines whether the first line is processed, and the processing that is performed when the second and third lines meet the conditions. As for the explanation of the if statement, the article here was easy to understand.

-② Slice The field_value [-1:] and field_value [-1:] in the first line of the above code use a mechanism called slicing. Slicing is a process such as extracting the number from the number of a character string. The explanation about slicing was easy to understand in the article here.

-③ + operator for concatenating character strings Is the code on the third line an addition at first glance? You might think that, but it's just a string and a string attached. It's enough to understand that you can do that, but for reference, I'll put a link here.

--Table: Function Regarding the function, the explanation of here was easy to understand.

Change to summary

Based on the above, I will add a commentary to the field code on my own.

Correspondence to the pattern that reads the date of the date item as zero (in the case of field item)


if(field_value[-1:]=="0"):              #field_The last character of value is"0"Then, do the following processing
    field_value = field_value[:-1]      #field_Exclude the last character from value Example: Change YYYY year MM month DD0 to YYYY year MM month DD
    field_value = field_value + "Day"    #↑ processed field_to value,"Day"Stick together

To put it in tremendous detail, the explanation of the processing on the second line is a little broken.

To be precise, it may be necessary to say, "Start with the first character of field_value, take out the character string that does not include the last character, and assign it to field_value", but it is difficult to understand. However, as a result, it is the same, so I adopted the explanation of the one who broke it.

When converting paper forms into data, I think that dates will become an indispensable item for almost all forms. Please take advantage of the code introduced here!

Recommended Posts

IQ Bot Custom Logic: Correcting common reading habits on dates
IQ Bot Custom Logic Basic Key
IQ Bot Custom Logic: Fixed Value Assignment
IQ Bot Custom Logic Related Processing Summary
IQ Bot Custom Logic (Python): Streamline exclusions in loops
IQ Bot Custom Logic (Python): Efficient replacement process in a loop
IQ Bot Custom Logic: Delete the last n rows of the table
IQ Bot Custom Logic: Split Application (Apply to Table, Include Error Control)