Overlapping regular expressions in Python and Java

reference: https://stackoverflow.com/questions/17971466/java-regex-overlapping-matches

Normally, when matching a regular expression, the string part adopted in one match is not duplicated in another match. For example, from the string " _apple_banana_cherry_ " If you match / _ [^ _] + _ /, you can get two things, _apple_ and _cherry_. If you want to use the overlapping "\ _" to take three of _apple_, _banana_, and _cherry_, you need to specify specially.

For python

An easy way is to use the regular expression package regex and match it with the option ʻoverlapped = True`.

{.python}


>>> import regex as re
>>> re.findall("_[^_]+_", "_apple_banana_cherry_")
['_apple_', '_cherry_']
>>> re.findall("_[^_]+_", "_apple_banana_cherry_", overlapped=True)
['_apple_', '_banana_', '_cherry_']

For Java

In the case of Java, it is easy to use the standard regular expression package and shift the startIndex.

{.java}


Matcher m = Pattern.compile("_[^_]+(_)").matcher("_apple_banana_cherry_");
if (m.find()) {
    do {
        System.out.println(m.group());
    } while (m.find(m.start(1)));
}

Commentary: With / _ [^ _] + (_) /, the second appearing _ is enclosed in()so that it can be acquired as group 1. m.start (1) returns the first index of group 1. m.find (N) means to start the match from the Nth character. So m.find (m.start (1)) means to start m.find () from the first index of group 1. In other words, in the second and subsequent loops, the next match is started from _ at the end of the matched character string.

note: It may be easier to specify which group should be the first index of the next match using a named group.

{.example}


(?<name>PATTERN)

You may also need a non-capture group for more complex match patterns.

{.example}


(?:PATTERN)

Recommended Posts

Overlapping regular expressions in Python and Java
Use regular expressions in Python
About Python and regular expressions
When using regular expressions in Python
[Python] Regular Expressions Regular Expressions
difference between statements (statements) and expressions (expressions) in Python
Differences in syntax between Python and Java
Replace non-ASCII with regular expressions in Python
Don't use \ d in Python 3 regular expressions!
How to use regular expressions in Python
Regular expressions that are easy and solid to learn in Python
Regular expression in Python
Regular expression in Python
Pharmaceutical company researchers summarized regular expressions in Python
I wrote a class in Python3 and Java
Multiple regression expressions in Python
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
Use regular expressions in C
I compared Java and Python!
Wrap long expressions in python
Stack and Queue in Python
Unittest and CI in Python
Poisson distribution and Poisson cumulative distribution plot via sqlite in Python and Java
Get rid of dirty data with Python and regular expressions
I tried programming the chi-square test in Python and Java.
Remove leading and trailing whitespace in Python, JavaScript, or Java
Graph the Poisson distribution and the Poisson cumulative distribution in Python and Java, respectively.
Detect and process signals in Java.
Python tuple comprehensions and generator expressions
Difference between java and python (memo)
MIDI packages in Python midi and pretty_midi
Difference between list () and [] in Python
Java and Python basic grammar comparison
Difference between == and is in python
View photos in Python and html
Sorting algorithm and implementation in Python
I can't remember Python regular expressions
Manipulate files and folders in Python
About dtypes in Python and Cython
Introductory Python Modules and conditional expressions
Assignments and changes in Python objects
Check and move directories in Python
Ciphertext in Python: IND-CCA2 and RSA-OAEP
Hashing data in R and Python
Function synthesis and application in Python
Export and output files in Python
Reverse Hiragana and Katakana in Python2.7
Reading and writing text in Python
[GUI in Python] PyQt5-Menu and Toolbar-
Handling regular expressions with PHP / Python
Create and read messagepacks in Python
A memo that handles double-byte double quotes in Python regular expressions
Solving in Ruby, Python and Java AtCoder ABC141 D Priority Queuing
How to display bytes in the same way in Java and Python
Differences in authenticity between Python and JavaScript
Notes using cChardet and python3-chardet in Python 3.3.1.
Modules and packages in Python are "namespaces"
Avoid nested loops in PHP and Python
Differences between Ruby and Python in scope
AM modulation and demodulation in Python Part 2
Eigenvalues and eigenvectors: Linear algebra in Python <7>