[PYTHON] The reason why the Regular Expression (RE) fail to parse .tex source.

nested commands

Example .tex source with nested commands.

\frac{1}{1+\frac{1}{1+\frac{1}{1+x}}}

In this case, it's impossible to search the othor of third bra's pair with RE.

(non-)greedy RE

Let's try!

  1. use greedy match
a = r"\frac{1}{1+\frac{1}{1+\frac{1}{1+x}}}"
m = re.search(r"\\frac\{.*\}\{.*\}", a)

This match to first frac's braket.

'\\frac{1}{1+\\frac{1}{1+\\frac{1}{1+x}}}'
  1. use non-greedy match
a = r"\frac{1}{1+\frac{1}{1+\frac{1}{1+x}}}"
m =re.search( r"\\frac\{.*?\}\{.*?\}", a)
m.group()

This match to

'\\frac{1}{1+\\frac{1}'

conclusion

Of cource you can maybe solve this problem using more complicated RE. However, everone hates to write intricately. I think that how to solve it is only parsing strings par one letter. Do you have any othor solusions?

Recommended Posts

The reason why the Regular Expression (RE) fail to parse .tex source.
Regular expression re
How to get all the possible values in a regular expression
Introduction to regular expression processing system