How to use regular expressions in Python

It seems that the regular expression search operators, patterns, and rules of the search itself are almost the same as Perl and PHP.

The usage of regular expression functions is completely different, so I will write it for my own study and organization.

Doesn't mention regular expression search operators, etc.

Initial setting

You can use regular expressions by loading the following library.

Read


import re

There are two ways to use regular expressions. One is to compile the pattern to be searched in advance. By using this method, when you search for the same pattern many times, you can search at high speed without having to specify the pattern each time. http://docs.python.jp/3/howto/regex.html#compiling-regular-expressions

Then, it is recommended to add r at the beginning of the pattern, it is basically okay without it, but by adding it, the backslash character in the string can be treated as a backslash as it is, so how to write the pattern It will be easier to understand.

http://docs.python.jp/3/howto/regex.html#the-backslash-plague

compile


pattern = r"ca"
text = "caabsacasca"
repatter = re.compile(pattern)
matchOB = repatter.match(text)

The other is to set a pattern when searching without compiling. In this case, if you don't want to reuse the search pattern, you should use this.

NoCompile


pattern = r"ca"
text = "caabsacasca"
matchOB = re.match(pattern , text)

retrieval method

There are four main search methods. http://docs.python.jp/3/howto/regex.html#performing-matches

Method/attribute Purpose
match(pattern, string) Determines if it matches the regular expression at the beginning of the string.
search(pattern, string) Manipulate the string to find out where the regular expression matches.
findall(pattern, string) Finds all the substrings that match the regular expression and returns it as a list.
finditer(pattern, string) Finds all the substrings that match the regular expression and returns it as an iterator.

Other functions (reference)

http://docs.python.jp/2.7/library/re.html#re.split

Method/attribute Purpose
split(pattern, string) Split each time there is a part that matches the regular expression.
sub(pattern, repl, string) Replace the part that matches the regular expression with the character in the repl

Let's look at the search methods one by one.

match () function

This is a function that determines if the pattern matches at the beginning of the string. The matchObject object goes into matchOB. Use the .group () function to extract the matched part from this object (str) (because there is a function to extract information from the object other than the group () function, it will be described later).

match


pattern = r"ca"
text = "caabsacasca"
matchOB = re.match(pattern , text)
if matchOB:
    print matchOB.group()  # 'ca'

Function to retrieve information from an object

Method/attribute Purpose
group() Returns a string that matches the regular expression.
start() Returns the start position of the match.
end() Returns the end position of the match.
span() Match position(start, end)Returns a tuple containing.

search () function

A function that determines if there is a part of the string that matches the pattern. Unlike the match () function, it matches even if the pattern is not at the beginning of the string. However, even if there are multiple matches, only the first one is returned.

search



pattern = r"ca"
text = "caabsacasca"
matchOB = re.search(pattern , text)

if matchOB:
    print matchOB
    print matchOB.group() #Returns the matched string# ca
    print matchOB.start() #Returns the start position of the match# 0
    print matchOB.end()  #Returns the end position of the match# 2
    print matchOB.span()  #Match position(start, end)Returns a tuple containing# (0, 2)

findall () function

A function that returns as a list all the parts of a string that match the pattern. Unlike search (), you can get all the matching parts. However, the return value is not a matchObject, but just a list of strings, so group () etc. cannot be used.

findall


pattern = r"ca"
text = "caabsacasca"
#Returns as a list everything that matches the pattern
matchedList = re.findall(pattern,text)
if matchedList:
    print matchedList # ['34567', '34567']

finditer () function

A function that returns the part of a string that matches a pattern with an iterator. By turning the return value for loop etc., it is the same as the findall () function, you can get all the matching parts, because the findall () function returns a list, but the finditer () function returns an object in the loop. , End (), start (), etc. are available.

finditer


pattern = r"ca"
text = "caabsacasca"
#Returns everything that matches the pattern as an iterator
iterator = re.finditer(pattern ,text)
for match in iterator:
    print match.group()   #First time:ca 2nd time: ca   
    print match.start()   #First time:0 2nd time: 6      
    print match.end()     #First time:2 2nd time: 8      
    print match.span()    #First time: (0, 2)Second time: (6, 8)

Property

Regular expressions such as perl set properties such as `/ pattern / s``` (. Matches newlines) and `/ pattern / i``` (case sensitive) at the end of the pattern You can, but in Python you do the following:

match


pattern = r"avSCSA"
text = "AVscsa"
------------------------
#Pattern to compile

repatter = re.compile(pattern, re.IGNORECASE)#Insensitive to case
matchOB = repatter.match(text)

------------------------
#Patterns that do not compile
matchOB = re.match(pattern , text, re.IGNORECASE)#Insensitive to case
--------------------------
if matchOB:
    print match.group()  # ''

Be sure to prefix the following properties with `re.```. Like re. DOTALL `re.L```.

Property meaning
ASCII, A \w, \b, \s,And\Matches d etc. only to ASCII characters with their respective properties.
DOTALL, S .To match any character, including newlines
IGNORECASE, I Performs a case-insensitive match
LOCALE, L Matches according to the locale
MULTILINE, M ^Or$Acts on and matches multiple lines
VERBOSE, X (for ‘extended’) You can make redundant regular expressions available to make them cleaner and easier to understand.

How to handle Japanese

Finally, I will explain how to handle Japanese, which always misleads Japanese pythonista.

It seems that the normal character string (str) type may be okay,

Ah


matchOB = re.match("Ah","Ah")
print matchOB.group()
#Ah

What if you do the following? Seems to be

[Ah-ゞ]


matchOB = re.match("[Ah-ゞ]","If")
#?

It's okay if you use unicode

u[Ah-ゞ]


matchOB = re.match(u"[Ah-ゞ]",u"If")
#If

So, when dealing with Japanese, let's make it Unicode type once

reference

str→unicode


u = "Japanese".decode("utf-8")

print type(u)
#unicode
print u 
#Japanese

unicode→str


u = u"Japanese".encode("utf-8")

print type(u)
#unicode
print u 
#Japanese

Question

Also, when dealing with unicode, it says to add re.U as an option, but I feel that it will be the same answer with or without it, but can anyone please tell me ~ http://pepper.is.sci.toho-u.ac.jp/index.php?%A5%CE%A1%BC%A5%C8%2FPython%2F%B4%C1%BB%FA%A4%CE%C0%B5%B5%AC%C9%BD%B8%BD

re.U


s = u'It's nice weather today'
r = re.compile(u'《[^》]*》')
news = r.sub('*', s, re.U)
print s, '>', news 
#It's nice weather today>today's weather is good*Ne

not(re.U)


s = u'It's nice weather today'
r = re.compile(u'《[^》]*》')
news = r.sub('*', s)
print s, '>', news 
#It's nice weather today>today's weather is good*Ne

reference

This page was used as a reference

Regular expression HOWTO — Python 3.4.1 documentation --http://docs.python.jp/3/howto/regex.html A brief summary of regular expressions in Python --minus9d's diary --http://minus9d.hatenablog.com/entry/20120713/1342188160 7.2. re — Regular expression operation — Python 2.7ja1 documentation --http://docs.python.jp/2.7/library/re.html Handling of Japanese in Python regular expression module --Notes on Linux during trial operation-http://d.hatena.ne.jp/kakurasan/20090424/p1 Handle Japanese with python regular expressions | taichino.com --http://taichino.com/programming/1272

Recommended Posts

How to use regular expressions in Python
Use regular expressions in Python
How to use SQLite in Python
How to use Mysql in python
How to use ChemSpider in Python
How to use PubChem in Python
Don't use \ d in Python 3 regular expressions!
How to use __slots__ in Python class
How to use is and == in Python
How to use the C library in Python
python3: How to use bottle (2)
[Python] How to use list 1
How to use Python Image Library in python3 series
Use regular expressions in C
Summary of how to use MNIST in Python
How to use Python argparse
Python: How to use pydub
[Python] How to use checkio
How to use tkinter with python in pyenv
How to develop in Python
[Python] How to use input ()
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
python3: How to use bottle
How to use Python bytes
[For beginners] How to use say command in python!
I tried to summarize how to use pandas in python
How to use the model learned in Lobe in Python
[Python] How to do PCA in Python
Python: How to use async with
[Python] How to use Pandas Series
How to use Requests (Python Library)
[Python] How to use list 3 Added
How to use OpenPose's Python API
How to wrap C in Python
How to use FTP with Python
Python: How to use pydub (playback)
How to use python zip function
How to handle Japanese in Python
[Python] How to use Typetalk API
When using regular expressions in Python
[Python] Regular Expressions Regular Expressions
How to use the __call__ method in a Python class
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
How to use calculated columns in CASTable
Overlapping regular expressions in Python and Java
[Python] Summary of how to use pandas
How to access environment variables in Python
How to dynamically define variables in Python
How to install and use pandas_datareader [Python]
How to do R chartr () in Python
[Itertools.permutations] How to put permutations in Python
How to use Google Test in C
[python] How to use __command__, function explanation
How to work with BigQuery in Python
[Python] How to use import sys sys.argv
How to get a stacktrace in python
How to display multiplication table in python
Easy way to use Wikipedia in Python
How to extract polygon area in Python