I tried to make a regular expression of "amount" using Python

Conclusion

Here is the regular expression for "amount" in python.

The end-of-yen version is below.

pattern = r'^(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})+)Circle'

# OK
#0 Yen
# 1,000 Yen
#100 yen
#12345 yen
#2000 yen
#1234 yen
#1000 yen

# NG
# 0,000 Yen
#000 Yen
# ,Circle
# 10,00 yen

The starting version of \ (yen mark) is as follows.

pattern = r'^¥(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})+)$'

# OK
# ¥0
# ¥1,000
# ¥100
# ¥12345
# ¥2000
# ¥1234
# ¥1000

# NG
# ¥0,000
# ¥000
# ¥,
# ¥10,00

Preparation

The environment uses Google Colaboratory. The Python version is below.

import platform
print("python " + platform.python_version())
# python 3.6.9

The regular expression check tool used: https://regex101.com/ While checking here, we will create a regular expression and implement it in the code.

スクリーンショット 2020-04-20 13.32.32.png

Also, this is easy to understand about Python regular expressions in general. https://qiita.com/luohao0404/items/7135b2b96f9b0b196bf3

Let's make a regular expression for the amount

End of circle version

Let's write the code immediately. First, import the library for using regular expressions.

import re

First of all 1000 yen Let's create a regular expression that matches the string.

pattern = r'1000 yen'

Of course, this is an exact match, so it matches. Let's check with the code.

pattern = r'1000 yen'
string = r'1000 yen'
prog = re.compile(pattern)
result = prog.match(string)
if result:
    print(result.group())
#1000 yen

The matched string is displayed. After that, for the sake of simplicity, only the regular expression pattern is described.

In addition to "1000 yen", there are "2000 yen" and "1234 yen". The regular expressions that match these are as follows.

pattern = r'\d\d\d\d yen'

The regular expression used is:

letter Description
\d Any number
Example Matching string
\d\d\d\d 1000, 2000, 1234

The regular expression above can be expressed more easily.

pattern = r'\d{4}Circle'

The newly used regular expressions are:

letter Description
{m} Repeat m of the previous character m times
Example Matching string
\d{4} 1000, 2000, 1234

However, with this, you can only take four-digit amounts such as "100 yen" and "12345 yen". Let's deal with any number of digits.

The modified regular expression is as follows.

pattern = r'\d+Circle'

The newly used regular expressions are:

letter Description
+ One or more repetitions of the previous character
Example Matching string
\d+ 1000, 100, 12345

However, with this, it is not possible to take a character string containing ", (comma)" such as "1,000 yen". Allow commas as well as numbers.

The modified regular expression is as follows.

pattern = r'[\d,]+Circle'

The newly used regular expressions are:

letter Description
[abc] a,b,Any letter of c
Example Matching string
[\d,] Numbers or,(comma)

I also used the following regular expression:

letter Description
+ One or more repetitions of the previous character
Example Matching string
[\d,]+ Numbers or,(Comma) one or more repetitions

Now you can handle numbers and (comma).

However, this will result in strings with incorrect comma positions, such as ", yen" and "10,00 yen". The comma is modified to be in every 3 digits, such as "1,000 yen" or "1,000,000 yen".

The modified regular expression is as follows.

pattern = r'\d{1,3}(,\d{3})+Circle'

The newly used regular expressions are:

letter Description
{m,n} Repeat m or more and n or less of the previous character
Example Matching string
\d{1,3} Repeating numbers from 1 to 3 times

I also used the following regular expression:

letter Description
(abc) Treat the string abc as a block
Example Matching string
(,\d{3}) 「,"000", such as ",(Comma) ”and 3 numbers are treated as one block

If you do this, you will not be able to take the string without commas that you used to take. I will modify it so that I can get only numbers.

pattern = r'(\d+|\d{1,3}(,\d{3})+)Circle'

The newly used regular expressions are:

letter Description
(abc|efg) Either abc or efg string
Example Matching string
(\d+|\d{1,3}(,\d{3})+) 1000, 1,000

However, this will also result in 0-starting strings such as "0,000 yen" and "000 yen".

The modified regular expression is as follows.

pattern = r'([1-9]\d*|[1-9]\d{0,2}(,\d{3})+)Circle'

The newly used regular expressions are:

letter Description
[a-c] a,b,Any letter of c
Example Matching string
[1-9] 1~9 (numbers excluding 0)

I also used the following regular expression:

letter Description
* Repeat 0 or more times of the previous character
Example Matching string
\d* Repeat the number 0 or more times

You have now excluded 0-based strings. However, only 0 yen must be allowed, so add this.

pattern = r'^(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})+)Circle'

The newly used regular expressions are:

letter Description
^ The beginning of the string

If you do not add "^ (hat)", "0 yen" such as "0,000 yen" will be taken as a partial match.

¥ Beginning version

Some amounts start with ¥ (yen mark) as well as those ending in yen, so let's create a regular expression here as well. In the regular expression above, delete the last "yen" and add "" at the beginning.

pattern = r'^¥(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})+)'

However, in this case, "¥ 1" of "¥ 1,000" will be taken as a partial match. The modified version is as follows.

pattern = r'^¥(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})+)$'

The newly used regular expressions are:

letter Description
$ End of string

By adding $ at the end, it is prevented from taking a partial match.

Summary

This time, I used Python to create a regular expression for "amount".

Character strings with a certain pattern, such as dates, times, and amounts, are compatible with regular expressions. Try to extract various character strings with regular expressions.

Recommended Posts

I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried to make a stopwatch using tkinter in python
I tried to make a todo application using bottle with python
I tried to make a ○ ✕ game using TensorFlow
I tried using Python (3) instead of a scientific calculator
I tried to make a simple mail sending application with tkinter of Python
I tried to make a simple text editor using PyQt
I tried to make a Web API
I tried to get a database of horse racing using Pandas
[Python] I tried to implement stable sorting, so make a note
[3rd] I tried to make a certain authenticator-like tool with python
I tried to create a list of prime numbers with python
I tried to make a periodical process with Selenium and Python
I tried to get a list of AMI Names using Boto3
I tried to make a 2channel post notification application with Python
[4th] I tried to make a certain authenticator-like tool with python
[1st] I tried to make a certain authenticator-like tool with python
I tried to make a mechanism of exclusive control with Go
I want to make a game with Python
I tried reading a CSV file using Python
Python: I tried to make a flat / flat_map just right with a generator
I tried to make a traffic light-like with Raspberry Pi 4 (Python edition)
I tried to perform a cluster analysis of customers using purchasing data
I tried to create a sample to access Salesforce using Python and Bottle
I tried to implement a card game of playing cards in Python
I want to make a web application using React and Python flask
[Python] I tried to make a simple program that works on the command line using argparse.
Make one repeating string with a Python regular expression.
I tried to make a "fucking big literary converter"
[Python] I tried to judge the member image of the idol group using Keras
I tried to summarize how to use matplotlib of python
I tried to draw a route map with Python
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
[Patent analysis] I tried to make a patent map with Python without spending money
I tried to implement a pseudo pachislot in Python
Continuation ・ I tried to make Slackbot after studying Python3
[Python] Smasher tried to make the video loading process a function using a generator
A super introduction to Django by Python beginners! Part 2 I tried using the convenient functions of the template
How to make a Python package using VS Code
[Python] I tried to make a Shiritori AI that enhances vocabulary through battles
[Python] I want to make a nested list a tuple
I tried to automatically generate a password with Python3
I tried to make a translation BOT that works on Discord using googletrans
[Python] I tried using OpenPose
[Python] I tried running a local server using flask
I tried drawing a pseudo fractal figure using Python
A python regular expression, or a memo of a match object
I made a script to record the active window using win32gui of Python
[Python] I tried to get Json of squid ring 2
I tried to access Google Spread Sheets using Python
I tried to make a suspicious person MAP quickly using Geolonia address data
I tried to draw a configuration diagram using Diagrams
I tried to summarize the string operations of Python
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to make a real-time sound source separation mock with Python machine learning
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I want to collect a lot of images, so I tried using "google image download"
[Python] I tried to automatically create a daily report of YWT with Outlook mail