Use regular expressions in Python

regex.spl


| makeresults 
| eval text="THE JAPANESE SCHOOL EDUCATION In Japan, education system is changing fast now." 
| rex field=text "(?ix)((?P<big_japan>(?P<japan>Japan).*?(?P=japan))) #From Japan to japan"

I was able to use group matches with Splunk as well. I'm posting it somewhere else, but I practiced because I couldn't do re too much.

re official is very easy to understand.

re

sample.txt


"THE JAPANESE SCHOOL EDUCATION In Japan, education system is changing fast now. The content of education is reduced and students come to have free time more. Furthermore, 'total education time' is taken in all Japanese junior high school. I think this change is bad and Japanese government must change it to original form rapidly for the following reasons. Firstly, many young people this time cannot read or write basic words (Japanese 'kanji.') And, they cannot calculate, too. These things are need in daily life, even if they don't go to college or university. Originally, Japanese student got better score in reading and calculation than any other country's student few decades ago. For, reading, writing, and calculation were very important in Japanese society. Now, however, this good value in old Japan is being reduced. This is very large problem in Japan. Secondly, there is deep gap between the level of high school education and university education. Many students who don't learn the content of high school education cannot catch up with the class in universities. Furthermore, for example, I am medical student, but I don't learn biology in high school. And there are many students like me. In addition, the care of university to us is nearly nothing. So, the level of the study in technology, medicine and so is going down. This is very large problem in Japan, too. Thirdly, as the content of school education is reduced, at the same time, the curiosity of students seems reduced. The new idea and new device are coming from the curiosity, I think. So, the reduction of it means the down of possibility that the evolutional change in various field will happen. This is very large problem in Japan. In conclusion, there are problems like these in Japan, because of the reduction of basic education. Luckily, the Japanese government is planning to change the education system. I hope this change will be going back to old Japanese school education system. \n"

https://www.f.waseda.jp/yusukekondo/TALL19/TALL_Spring03.html Quoted from

search Use search because match matches only from the beginning (^ keyword).

search.py


import re

m=re.compile(r"""
\b(?P<sentence>.*?[.]) #Try to extract with sentences
""",re.X)

result=m.search(text)

print(result)

Since it is in English, try separating it with ..

result

<_sre.SRE_Match object; span=(0, 78), match='THE JAPANESE SCHOOL EDUCATION In Japan, education>

Since len ('"THE JAPANESE SCHOOL EDUCATION In Japan, education system is changing fast now.') Is 79, it matches.

Why doesn't the description of Match object appear in Python3: thinking:

SRE_Match object#

getattr.py


import re

m=re.compile(r"""
\b(?P<sentence>.*?[.]) #Try to extract with sentences
""",re.X)

result=m.search(text)

for i in dir(result):
  if not i.startswith('__'): 
   print(f'{i}: {getattr(result,i)}')

I'm not sure what the * Match Object * is, so let's check the method.

result

result


end: <built-in method end of _sre.SRE_Match object at 0x7fe65d3ba198>
endpos: 1969
expand: <built-in method expand of _sre.SRE_Match object at 0x7fe65d3ba198>
group: <built-in method group of _sre.SRE_Match object at 0x7fe65d3ba198>
groupdict: <built-in method groupdict of _sre.SRE_Match object at 0x7fe65d3ba198>
groups: <built-in method groups of _sre.SRE_Match object at 0x7fe65d3ba198>
lastgroup: sentence
lastindex: 1
pos: 0
re: re.compile('\n\\b(?P<sentence>.*?[.]) #Try to extract with sentences\n', re.VERBOSE)
regs: ((0, 78), (0, 78))
span: <built-in method span of _sre.SRE_Match object at 0x7fe65d3ba198>
start: <built-in method start of _sre.SRE_Match object at 0x7fe65d3ba198>
string: THE JAPANESE SCHOOL EDUCATION In Japan, education system is changing fast now. The (...abridgement)

As per Python2.7 Match Object. Where is the Python3 guy? Thinking:

findall

findall.py


m=re.compile(r"""
\b(?P<sentence>.*?[.]) #Try to extract with sentences
""",re.X)

result=m.findall(text)  #There is only one search, but all findall

print(type(result))
print('-'*10)
for i in dir(result):
  if not i.startswith('__'): 
   print(f'{i}: {getattr(result,i)}')
print('-'*10)
for i in result:
  print(i) #Since the result is a list, expand one by one

If you want to get all the matches, findall

result

result


<class 'list'>
----------
append: <built-in method append of list object at 0x7fe65d2dca48>
clear: <built-in method clear of list object at 0x7fe65d2dca48>
copy: <built-in method copy of list object at 0x7fe65d2dca48>
count: <built-in method count of list object at 0x7fe65d2dca48>
extend: <built-in method extend of list object at 0x7fe65d2dca48>
index: <built-in method index of list object at 0x7fe65d2dca48>
insert: <built-in method insert of list object at 0x7fe65d2dca48>
pop: <built-in method pop of list object at 0x7fe65d2dca48>
remove: <built-in method remove of list object at 0x7fe65d2dca48>
reverse: <built-in method reverse of list object at 0x7fe65d2dca48>
sort: <built-in method sort of list object at 0x7fe65d2dca48>
----------
THE JAPANESE SCHOOL EDUCATION In Japan, education system is changing fast now.
The content of education is reduced and students come to have free time more.
Furthermore, 'total education time' is taken in all Japanese junior high school.
I think this change is bad and Japanese government must change it to original form rapidly for the following reasons.
Firstly, many young people this time cannot read or write basic words (Japanese 'kanji.
And, they cannot calculate, too.
These things are need in daily life, even if they don't go to college or university.
Originally, Japanese student got better score in reading and calculation than any other country's student few decades ago.
For, reading, writing, and calculation were very important in Japanese society.
Now, however, this good value in old Japan is being reduced.
This is very large problem in Japan.
Secondly, there is deep gap between the level of high school education and university education.
Many students who don't learn the content of high school education cannot catch up with the class in universities.
Furthermore, for example, I am medical student, but I don't learn biology in high school.
And there are many students like me.
In addition, the care of university to us is nearly nothing.
So, the level of the study in technology, medicine and so is going down.
This is very large problem in Japan, too.
Thirdly, as the content of school education is reduced, at the same time, the curiosity of students seems reduced.
The new idea and new device are coming from the curiosity, I think.
So, the reduction of it means the down of possibility that the evolutional change in various field will happen.
This is very large problem in Japan.
In conclusion, there are problems like these in Japan, because of the reduction of basic education.
Luckily, the Japanese government is planning to change the education system.
I hope this change will be going back to old Japanese school education system.

The result is a list

split

split.py


result1=re.split('(?<=\.)\s',text)  #I tried to include the delimiter with split.


print(type(result1))
print('-'*10)

m2=re.compile(r"""
(?P<japan_txt>japan.*?)\b #test
""",re.VERBOSE|re.IGNORECASE)

{i:[v,re.search(m2,v).group()] for i,v in enumerate(result1) if re.search(m2,v)}

I thought that split () would be enough to separate sentences. I wanted to keep the delimiter as ., so I separated it with the (space) after that.

result

result


<class 'list'>
----------
{0: ['THE JAPANESE SCHOOL EDUCATION In Japan, education system is changing fast now.',
  'JAPANESE'],
 2: ["Furthermore, 'total education time' is taken in all Japanese junior high school.",
  'Japanese'],
 3: ['I think this change is bad and Japanese government must change it to original form rapidly for the following reasons.',
  'Japanese'],
 4: ["Firstly, many young people this time cannot read or write basic words (Japanese 'kanji.') And, they cannot calculate, too.",
  'Japanese'],
 6: ["Originally, Japanese student got better score in reading and calculation than any other country's student few decades ago.",
  'Japanese'],
 7: ['For, reading, writing, and calculation were very important in Japanese society.',
  'Japanese'],
 8: ['Now, however, this good value in old Japan is being reduced.', 'Japan'],
 9: ['This is very large problem in Japan.', 'Japan'],
 16: ['This is very large problem in Japan, too.', 'Japan'],
 20: ['This is very large problem in Japan.', 'Japan'],
 21: ['In conclusion, there are problems like these in Japan, because of the reduction of basic education.',
  'Japan'],
 22: ['Luckily, the Japanese government is planning to change the education system.',
  'Japanese'],
 23: ['I hope this change will be going back to old Japanese school education system.',
  'Japanese']}

The result is a list

After that, * japan * is searched by (re.IGNORECASE) regardless of case, and the line containing that character is output in the dictionary type ofindex: [corresponding line, search character]. ..

finditer

m2=re.compile(r"""
(?P<japan_txt>japan.*?)\b #test
""",re.VERBOSE|re.IGNORECASE)

result=re.finditer(m2,text)

print(result)

print('-'*10)

for i in result:
  print(i)

finditer that returns the result with Iterator type (https://docs.python.org/ja/3/library/stdtypes.html#typeiter)

result

result


<callable_iterator object at 0x7fe65d2e5ba8>
----------
<_sre.SRE_Match object; span=(4, 12), match='JAPANESE'>
<_sre.SRE_Match object; span=(33, 38), match='Japan'>
<_sre.SRE_Match object; span=(209, 217), match='Japanese'>
<_sre.SRE_Match object; span=(269, 277), match='Japanese'>
<_sre.SRE_Match object; span=(427, 435), match='Japanese'>
<_sre.SRE_Match object; span=(576, 584), match='Japanese'>
<_sre.SRE_Match object; span=(749, 757), match='Japanese'>
<_sre.SRE_Match object; span=(804, 809), match='Japan'>
<_sre.SRE_Match object; span=(858, 863), match='Japan'>
<_sre.SRE_Match object; span=(1368, 1373), match='Japan'>
<_sre.SRE_Match object; span=(1705, 1710), match='Japan'>
<_sre.SRE_Match object; span=(1760, 1765), match='Japan'>
<_sre.SRE_Match object; span=(1825, 1833), match='Japanese'>
<_sre.SRE_Match object; span=(1934, 1942), match='Japanese'>

The place and the matching part are returned.

groupdict

groupdict.py


m2=re.compile(r"""
(?P<japan_txt>japan.*?)\b #test
""",re.VERBOSE|re.IGNORECASE)

result=re.finditer(m2,text)


[i.groupdict() for i in result]

result

result


[{'japan_txt': 'JAPANESE'},
 {'japan_txt': 'Japan'},
 {'japan_txt': 'Japanese'},
 {'japan_txt': 'Japanese'},
 {'japan_txt': 'Japanese'},
 {'japan_txt': 'Japanese'},
 {'japan_txt': 'Japanese'},
 {'japan_txt': 'Japan'},
 {'japan_txt': 'Japan'},
 {'japan_txt': 'Japan'},
 {'japan_txt': 'Japan'},
 {'japan_txt': 'Japan'},
 {'japan_txt': 'Japanese'},
 {'japan_txt': 'Japanese'}]

Matched characters and captured characters are returned

Summary

I tried various things for the time being, but it's still not enough I will end it once.

Recommended Posts

Use regular expressions in Python
Don't use \ d in Python 3 regular expressions!
How to use regular expressions in Python
Use regular expressions in C
[Python] Regular Expressions Regular Expressions
When using regular expressions in Python
Use config.ini in Python
Use dates in Python
Use Valgrind in Python
Replace non-ASCII with regular expressions in Python
Use profiler in Python
Regular expression in Python
Regular expression in Python
Pharmaceutical company researchers summarized regular expressions in Python
Multiple regression expressions in Python
Use let expression in Python
Use Measurement Protocol in Python
Use callback function in Python
Use parameter store in Python
Use HTTP cache in Python
Use MongoDB ODM in Python
Use list-keyed dict in Python
Use Random Forest in Python
Use Spyder in Python IDE
Wrap long expressions in python
About Python and regular expressions
Use fabric as is in python (fabric3)
How to use SQLite in Python
Use rospy with virtualenv in Python3
How to use Mysql in python
Use Python in pyenv with NeoVim
How to use ChemSpider in Python
How to use PubChem in Python
Use OpenCV with Python 3 in Window
Handling regular expressions with PHP / Python
Extract arbitrary strings using Python regular expressions / Use named groups
Quadtree in Python --2
Python in optimization
CURL in python
[Introduction to Python] How to use class in Python?
Geocoding in python
SendKeys in Python
Use print in a Python2 lambda expression
Meta-analysis in Python
A memo that handles double-byte double quotes in Python regular expressions
difference between statements (statements) and expressions (expressions) in Python
Unittest in python
Regular expressions that are easy and solid to learn in Python
Start / end match in python regular expression
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Easy way to use Wikipedia in Python
Plink in Python
Constant in python
Lifegame in Python.