This article is the 18th day article of Takumi Akashiro Alone Advent Calendar 2020.
Do you use regular expressions? !! !! !!
The last time I used it was about a month ago. I will use it when I need it.
Although it is such a regular expression, if you use it properly, you can extract any character string group with a good feeling.
TLDL
Use named groups to retrieve strings!
>> text = "environ/house-food/apple-pie02.fbx"
>> import re
>> reg_text = r'(?P<main>(chara|environ))/(?P<sub>[^-/]*)-?(?P<sub_sub>[^/]*)/(?P<filling>[^-]*)-pie'
>> match = re.search(reg_text, text)
>> print(match.groupdict())
{'main': 'environ', 'sub': 'house', 'sub_sub': 'food', 'filling': 'apple'}
First of all, the basics of regular expressions, we will match appropriately.
#! python3
import re
def main():
# NOTE:It doesn't matter, but you often see apple pies in regular expression samples.
text = "environ/house-food/apple-pie02.fbx"
match_obj= re.search(r'pie', text)
if match_obj:
print("It was a hit!")
else:
print("I won't hit!")
if __name__ == '__main__':
main()
Well, it's like this. It's super easy.
If speed is required and head matching is possible, use re.match
, and if you want to replace, use re.sub
.
It's not a problem to use regular expressions for the time being. [^ 1]
[^ 1]: Addendum: If you are concerned about speed, if you use the same regular expression inside the for loop, it will be a little faster to reuse it by re.compile
outside the for. ..
Now, how do you extract the string apple
before pie
?
I think that I will remove various character strings by scraping them.
But what if there are multiple acquisition targets? For example, what if you want to take environ
, house
, food
and apple
all at once?
Groups can be used in such cases. Let's read the official documentation for a moment.
Regular expression syntax
(Omitted) (...) Matches the regular expression enclosed in parentheses and represents the start and end of the group. The contents of the group can be retrieved after the match has been performed, or can be subsequently matched in the string with the \ number special sequence, as described below. To match a literal'(' or')', use \ (or ) or enclose it in a character class: [(], [)].
re --- Regular Expression Manipulation — From Python 3.9.1 Documentation
...... I don't know how to use it ... So I will give you a sample.
#! python3
import re
def main():
text = "environ/house-food/apple-pie02.fbx"
match_obj= re.search(r'([^/-]*)-?pie)', text)
print(match_obj.groups())
if __name__ == '__main__':
main()
By match_obj.groups ()
on the match object,
You can get a list of strings stuck in a grouped regular expression.
So I want to extract environ
, house
, food
and apple
from the above text
,
#! python3
import re
def main():
text = "environ/house-food/apple-pie02.fbx"
match = re.search(r'([^/-]*)/([^/-]*)-?([^/-]*)/([^/-]*)-?pie', text)
if match:
print(match.groups())
if __name__ == '__main__':
main()
given that……
You've done it safely!
It's so convenient!
I want you to use dict
instead of list
.
I often use ()
in regular expressions, so I think I want only the necessary parts.
In such a case, this is a "named group"!
As usual, read the official docs.
Regular expression syntax
(Omitted) (?P<name>...) Similar to regular parentheses, but the substrings matched by this group can be accessed by the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a group that is numbered as if it had not been named.
Well, I will write it in the spirit that you will understand if you use it for the time being.
#! python3
import re
def main():
text = "environ/house-food/apple-pie02.fbx"
match = re.search(r'(?P<main>[^/-]*)/(?P<sub>[^/-]*)-?(?P<sub_sub>[^/-]*)/(?P<filling>[^/-]*)-?pie', text)
if match:
print(match.groupdict())
if __name__ == '__main__':
main()
By match_obj.groupdict ()
on the match object,
You can get a dictionary of named groups.
You can take it out nicely!
~~ I can't think of anything …… ~~
It's convenient!
Recommended Posts