It seems that the regular expression search operators, patterns, and rules of the search itself are almost the same as Perl and PHP.

The usage of regular expression functions is completely different, so I will write it for my own study and organization.

Doesn't mention regular expression search operators, etc.

Initial setting

You can use regular expressions by loading the following library.

`Read`


import re

There are two ways to use regular expressions. One is to compile the pattern to be searched in advance. By using this method, when you search for the same pattern many times, you can search at high speed without having to specify the pattern each time. http://docs.python.jp/3/howto/regex.html#compiling-regular-expressions

Then, it is recommended to add r at the beginning of the pattern, it is basically okay without it, but by adding it, the backslash character in the string can be treated as a backslash as it is, so how to write the pattern It will be easier to understand.

http://docs.python.jp/3/howto/regex.html#the-backslash-plague

`compile`


pattern = r"ca"
text = "caabsacasca"
repatter = re.compile(pattern)
matchOB = repatter.match(text)

The other is to set a pattern when searching without compiling. In this case, if you don't want to reuse the search pattern, you should use this.

`NoCompile`


pattern = r"ca"
text = "caabsacasca"
matchOB = re.match(pattern , text)

retrieval method

There are four main search methods. http://docs.python.jp/3/howto/regex.html#performing-matches

Method/attribute	Purpose
match(pattern, string)	Determines if it matches the regular expression at the beginning of the string.
search(pattern, string)	Manipulate the string to find out where the regular expression matches.
findall(pattern, string)	Finds all the substrings that match the regular expression and returns it as a list.
finditer(pattern, string)	Finds all the substrings that match the regular expression and returns it as an iterator.

Other functions (reference)

http://docs.python.jp/2.7/library/re.html#re.split

Method/attribute	Purpose
split(pattern, string)	Split each time there is a part that matches the regular expression.
sub(pattern, repl, string)	Replace the part that matches the regular expression with the character in the repl

Let's look at the search methods one by one.

match () function

This is a function that determines if the pattern matches at the beginning of the string. The matchObject object goes into matchOB. Use the .group () function to extract the matched part from this object (str) (because there is a function to extract information from the object other than the group () function, it will be described later).

`match`


pattern = r"ca"
text = "caabsacasca"
matchOB = re.match(pattern , text)
if matchOB:
    print matchOB.group()  # 'ca'

Function to retrieve information from an object

Method/attribute	Purpose
group()	Returns a string that matches the regular expression.
start()	Returns the start position of the match.
end()	Returns the end position of the match.
span()	Match position(start, end)Returns a tuple containing.

search () function

A function that determines if there is a part of the string that matches the pattern. Unlike the match () function, it matches even if the pattern is not at the beginning of the string. However, even if there are multiple matches, only the first one is returned.

`search`



pattern = r"ca"
text = "caabsacasca"
matchOB = re.search(pattern , text)

if matchOB:
    print matchOB
    print matchOB.group() #Returns the matched string# ca
    print matchOB.start() #Returns the start position of the match# 0
    print matchOB.end()  #Returns the end position of the match# 2
    print matchOB.span()  #Match position(start, end)Returns a tuple containing# (0, 2)

findall () function

A function that returns as a list all the parts of a string that match the pattern. Unlike search (), you can get all the matching parts. However, the return value is not a matchObject, but just a list of strings, so group () etc. cannot be used.

`findall`


pattern = r"ca"
text = "caabsacasca"
#Returns as a list everything that matches the pattern
matchedList = re.findall(pattern,text)
if matchedList:
    print matchedList # ['34567', '34567']

finditer () function

A function that returns the part of a string that matches a pattern with an iterator. By turning the return value for loop etc., it is the same as the findall () function, you can get all the matching parts, because the findall () function returns a list, but the finditer () function returns an object in the loop. , End (), start (), etc. are available.

`finditer`


pattern = r"ca"
text = "caabsacasca"
#Returns everything that matches the pattern as an iterator
iterator = re.finditer(pattern ,text)
for match in iterator:
    print match.group()   #First time:ca 2nd time: ca   
    print match.start()   #First time:0 2nd time: 6      
    print match.end()     #First time:2 2nd time: 8      
    print match.span()    #First time: (0, 2)Second time: (6, 8)

Property

Regular expressions such as perl set properties such as `/ pattern / s``` (. Matches newlines) and `/ pattern / i``` (case sensitive) at the end of the pattern You can, but in Python you do the following:

`match`


pattern = r"avSCSA"
text = "AVscsa"
------------------------
#Pattern to compile

repatter = re.compile(pattern, re.IGNORECASE)#Insensitive to case
matchOB = repatter.match(text)

------------------------
#Patterns that do not compile
matchOB = re.match(pattern , text, re.IGNORECASE)#Insensitive to case
--------------------------
if matchOB:
    print match.group()  # ''

Be sure to prefix the following properties with `re.```. Like re. DOTALL `re.L```.

Property	meaning
ASCII, A	\w, \b, \s,And\Matches d etc. only to ASCII characters with their respective properties.
DOTALL, S	.To match any character, including newlines
IGNORECASE, I	Performs a case-insensitive match
LOCALE, L	Matches according to the locale
MULTILINE, M	^Or$Acts on and matches multiple lines
VERBOSE, X (for ‘extended’)	You can make redundant regular expressions available to make them cleaner and easier to understand.

How to handle Japanese

Finally, I will explain how to handle Japanese, which always misleads Japanese pythonista.

It seems that the normal character string (str) type may be okay,

`Ah`


matchOB = re.match("Ah","Ah")
print matchOB.group()
#Ah

What if you do the following? Seems to be

`[Ah-ゞ]`


matchOB = re.match("[Ah-ゞ]","If")
#?

It's okay if you use unicode

`u[Ah-ゞ]`


matchOB = re.match(u"[Ah-ゞ]",u"If")
#If

So, when dealing with Japanese, let's make it Unicode type once

reference

`str→unicode`


u = "Japanese".decode("utf-8")

print type(u)
#unicode
print u 
#Japanese

`unicode→str`


u = u"Japanese".encode("utf-8")

print type(u)
#unicode
print u 
#Japanese

Question

Also, when dealing with unicode, it says to add re.U as an option, but I feel that it will be the same answer with or without it, but can anyone please tell me ~ http://pepper.is.sci.toho-u.ac.jp/index.php?%A5%CE%A1%BC%A5%C8%2FPython%2F%B4%C1%BB%FA%A4%CE%C0%B5%B5%AC%C9%BD%B8%BD

`re.U`


s = u'It's nice weather today'
r = re.compile(u'《[^》]*》')
news = r.sub('*', s, re.U)
print s, '>', news 
#It's nice weather today>today's weather is good*Ne

`not(re.U)`


s = u'It's nice weather today'
r = re.compile(u'《[^》]*》')
news = r.sub('*', s)
print s, '>', news 
#It's nice weather today>today's weather is good*Ne

reference

This page was used as a reference

Regular expression HOWTO — Python 3.4.1 documentation --http://docs.python.jp/3/howto/regex.html A brief summary of regular expressions in Python --minus9d's diary --http://minus9d.hatenablog.com/entry/20120713/1342188160 7.2. re — Regular expression operation — Python 2.7ja1 documentation --http://docs.python.jp/2.7/library/re.html Handling of Japanese in Python regular expression module --Notes on Linux during trial operation-http://d.hatena.ne.jp/kakurasan/20090424/p1 Handle Japanese with python regular expressions | taichino.com --http://taichino.com/programming/1272

How to use regular expressions in Python

Doesn't mention regular expression search operators, etc.

Initial setting

Read

compile

NoCompile

retrieval method

Other functions (reference)

match () function

match

Function to retrieve information from an object

search () function

search

findall () function

findall

finditer () function

finditer

Property

match

How to handle Japanese

Ah

[Ah-ゞ]

u[Ah-ゞ]

reference

str→unicode

unicode→str

Question

re.U

not(re.U)

reference

`Read`

`compile`

`NoCompile`

`match`

`search`

`findall`

`finditer`

`match`

`Ah`

`[Ah-ゞ]`

`u[Ah-ゞ]`

`str→unicode`

`unicode→str`

`re.U`

`not(re.U)`