[Python] A function that searches the entire string with a regular expression and retrieves all matching strings.

If you just want to get the string, just re.findall.

python


target_text = "So far, seven types of "coronavirus" that infect humans have been found, and one of them is the so-called "new coronavirus (SARS)" that has been a problem since December last year.-CoV2) ". Of these, four types of viruses account for 10 to 15% of common colds (35% during the epidemic), and most are mild. The remaining two viruses are "Severe Acute Respiratory Syndrome (SARS)" that occurred in 2002 and "Middle East Respiratory Syndrome (MERS)" that has occurred since 2012. Coronaviruses infect all animals, but rarely infect other animals of different species. In addition, it is known that the infectivity is lost by alcohol disinfection (70%)."
keyword = "([0-9]+)"

results = re.findall(keyword, target_text)

# ['12', '2', '10', '15', '35', '2002', '2012', '70']

However, there is no function that can be used when you want to get all the match objects that can be obtained by re.search (). So, write a function that recursively searches the entire string.

python


import re

def search_all(regrex, target, search_start_index=0, matches=None):

  if matches == None:
    matches = []

  match = re.search(regrex, target[search_start_index:])

  if match == None:
    return matches
  
  matches.append(match)
  
  return search_all(regrex, target, search_start_index + match.end() + 1, matches)

python


target_text = "So far, seven types of "coronavirus" that infect humans have been found, and one of them is the so-called "new coronavirus (SARS)" that has been a problem since December last year.-CoV2) ". Of these, four types of viruses account for 10 to 15% of common colds (35% during the epidemic), and most are mild. The remaining two viruses are "Severe Acute Respiratory Syndrome (SARS)" that occurred in 2002 and "Middle East Respiratory Syndrome (MERS)" that has occurred since 2012. Coronaviruses infect all animals, but rarely infect other animals of different species. In addition, it is known that the infectivity is lost by alcohol disinfection (70%)."
keyword = "([0-9]+)"

search_result_groups = search_all(keyword, target_text)

for item in search_result_groups:
  print(item.group())

# 12
# 2
# 10
# 15
# 35
# 2002
# 2012
# 70

Postscript

--I added / edited re.findall () in the comments. Thank you very much. --In the comments, it was pointed out that writing the default value as matches = [] is an anti-pattern, and I corrected it. Thank you very much.

Recommended Posts

[Python] A function that searches the entire string with a regular expression and retrieves all matching strings.
Get the matched string with a regular expression and reuse it when replacing on Python3
The eval () function that calculates a string as an expression in python
Make one repeating string with a Python regular expression.
Determine if a string is a time with a python regular expression
String replacement with Python regular expression
A python regular expression, str and unicode that are sober and addictive
[Introduction to Python] How to write a character string with the format function
[Python] I made a function that can also use regular expressions that replace character strings all at once.
Archive and compress the entire directory with python
Change the string to be replaced according to the matched string by replacing with Python regular expression
# Function that returns the character code of a string
[Python] Leave only the elements that start with a specific character string in the array
A memo that I touched the Datastore with python
Checks if there is a specific character string for all files under the directory that is Python and outputs the target line
[C / C ++] Pass the value calculated in C / C ++ to a python function to execute the process, and use that value in C / C ++.
I also tried to imitate the function monad and State monad with a generator in Python
Associate Python Enum with a function and make it Callable
Regular expression manipulation with Python
Understand the probabilities and statistics that can be used for progress management with a python program
[Python 3.8 ~] How to define a recursive function smartly with a lambda expression
A function that measures the processing time of a method in python
The story of making a module that skips mail with python
A script that retrieves tweets with Python, saves them in an external file, and performs morphological analysis.
[Python] Make the function a lambda function
Fill the string with zeros in python and count some characters from the string
Extract lines that match the conditions from a text file with python
URL match checking and extraction with python regular expression Regex Full version
Solve with Ruby, Perl, Java and Python AtCoder ABC 047 C Regular Expression
[Python] A game that uses regular expressions when, where, who, and what
[Python] Note: A self-made function that finds the area of the normal distribution
[Python] Creating a tool that can list, select, and execute python files with tkinter & about the part that got caught
[Python] A function that aligns the width by inserting a space in text that has both full-width and half-width characters.
Create a Python function decorator with Class
[Python] A program that creates stairs with #
Search the maze with the python A * algorithm
(Python) HTML reading and regular expression notes
A typed world that begins with Python
[Python] A program that rounds the score
Function to extract the maximum and minimum values ​​in a slice with Go
A Python script that reads a SQL file, executes BigQuery and saves the csv
Create a simple reception system with the Python serverless framework Chalice and Twilio
Around the authentication of PyDrive2, a package that operates Google Drive with Python
I made a familiar function that can be used in statistics with Python
Solve the Python asymmetric traveling salesman problem with a branch and bound method
[Python] A program that finds the minimum and maximum values without using methods
[Python] A program that calculates the number of updates of the highest and lowest records
[Python] I tried to get the type name as a string from the type function
Get the stock price of a Japanese company with Python and make a graph
Python script that makes UTF-8 files with all BOMs under the folder without BOMs