IQ Bot gives you the freedom to choose which OCR engine to process.
The features and habits of the OCR engine that can be selected are summarized in this article, but depending on the engine, the date in "YYYY / MM / DD" format In some cases, the "/" (slash) in the was read as "1".
Example: What is written as 2020/11/11 is read as 2020111111, etc.
Of course, in such a case, it is common to perform Error detection by specifying the date type, but custom It is more efficient to use logic to correct the shape to some extent and then apply error detection.
That's why I'll publish the custom logic I created for the problem marked.
If I divided it into various cases, the number of lines would be quite large. If you come up with a simpler way, please comment.
Return the slash of the date read as 1 to the slash
#A function that replaces a slash with a slash if there is a 1 in the position where it should be
def slashConvert(x,index):
if x[index] == "1":
x = x[:index] + "/" + x[index + 1:]
return x
#A function that identifies the position where a slash should enter and applies conversion
# x =Original value, y=Number of digits in the year(YYYY/MM/4 for DD,YY/MM/2 for DD)give
def date1toSlash(x,y):
#If it already contains two or more slashes, just return
if "/" in x:
if len(x.split("/")) >= 3:
return x
#Minimum after the year number/M/There should be 4 or more characters in D
#If it does not meet that, return as it is(=Detected by verification)
if len(x) < y + 4:
return x
#Put a slash between the year and the month
x = slashConvert(x,y)
#Ask which digit the slash is in to put the slash between the month and the day
#If it is unknown, return → as it is, an incomplete value will be returned as a date, so it can be detected by verification on the IQ Bot side.
if len(x) == y + 6: #6 digits other than year= /MM/DD =2 digits for both month and day=The third digit from the bottom is a slash
mdIndex = -3
elif len(x) == y + 5: #5 digits other than year= /M/DD or /MM/D ← It is necessary to identify one
if 2 <= int(x[y+1]) <= 9: #February-September if the next digit of the year is 2-9=Since it is a single digit month, M/DD =The third digit from the bottom is a slash
mdIndex = -3
elif x[y+1] == "0": #The next digit of the year is zero= MM/D? But it is suspicious that only the month is filled with zero, so return as it is
return x
elif x[-3:-1] == "11": #The 3rd digit from the buttocks & the 2nd digit from the buttocks are "11" (that is, the last 4 digits are?11?)in the case of
if x[-4:] == "1110": #1 for 1110/It's decided to be 10, but ...
mdIndex = -3
else: #Other than the above (1111 etc.) 1/Is it 11?/I can't judge if it's 1, so return as it is
return x
elif (x[-3] == "1") and (x[-2] != "1"): #1 if the date part is 1120/Determined to 20
mdIndex = -3
elif (x[-2] == "1") and (x[-3] != "1"): #12 if the date part is 1211/Determined to 1
mdIndex = -2
else: #No patterns other than the above are expected
return x
else: #The pattern that comes to else is a 4-digit pattern other than the year= /M/Since it is D, the second digit from the bottom is a slash
mdIndex = -2
x = slashConvert(x,mdIndex)
return x
#When applying to field items and the number of digits in the year is 4
field_value = date1toSlash(field_value,4)
#When applying to a table item and the number of digits in the year is 2 digits
df['Column name'] = df['Column name'].apply(date1toSlash,y=2)
The code to be applied to the above table is assumed to be sandwiched between Magic Code.