Python lexical / parsing library (2014.11 first survey, 2019.10 partial addition)

2019-10-06 postscript

I can't handle left recursion, but PEG parsers are quite popular these days. For Python, there are https://github.com/erikrose/parsimonious and https://github.com/KuramitsuLab/pegpy.

Motivation for investigation

I wanted to parse the formula used in the theorem proving support system called Coq https://coq.inria.fr/. So I took a quick look at some Python lexical / parsing tools.

Evaluation criteria

The grammar of Coq expressions is quite complicated (according to the grammar definition, it is a left-recursive grammar [^ 1] and requires lookahead). Therefore, the following points were used when selecting. If you want to deal with other grammars, of course, it will be a different criterion.

--Do you want to handle left recursion grammar? -Is it a complicated grammar that requires look-ahead? --Whether to write the grammar in DSL or define it as a parser combinator

Below are the survey results.

[^ 1]: ʻexpr :: = expr A grammar with rules like "+" term`. It is theoretically possible to rewrite it into a grammar that does not have left recursion, but I don't want to do it with a huge grammar like Coq, and I don't want to make human mistakes.

New one

pyparsing (2.2.0) 2017/03 Parsing library. A parser-based approach that creates parsing rules by combining classes such as ʻOneOf, ʻOptional, and Group. It is very easy to define "nested comments", which is troublesome in grammar design of programming languages. It is very easy to specify the language class, but the left recursion grammar cannot be handled as it is, so some ingenuity is required. How to use Pyparsing: http://masato.github.io/2014/07/01/python27-etl-pyparsing-syntactic-analysis/

PLY (Python Lex-Yacc) (3.11) 2018/02 Python version of Lex / Yacc, a well-known C lexical / parsing tool. This means that similar grammar definition syntax can be used. LALR (1) parsing is basically used for the analysis. Headquarters: http://www.dabeaz.com/ply/ Reference: http://blog.livedoor.jp/shf0811/archives/7346881.html

parse (1.6.6; 2014/11) A pattern matching library that works the opposite of String.format (). Reference: http://coreblog.org/ats/python-parse/

Old / no longer

Yapps (2.2.0) 2014/06
Extended LL (1) parser. Apparently there is no look ahead, so it is not suitable for parsing in complicated languages. Reference: https://github.com/mk-fg/yapps
jupyLR (0.3) 2012/06
Generalized LR parser. Multiple syntax trees can be output when the syntax is ambiguous by breadth-first search. This makes it possible to handle reduce / reduce collisions that cannot be handled by the LR parser. Headquarters: http://bl0b.github.io/jupyLR/
plex
Flex-like lexical analysis tool. Development seems to have stopped at 2.0.0 dev: https://pythonhosted.org/plex/
SimpleParse
Not updated since 2011: http://simpleparse.sourceforge.net/
SPARK
Update status unknown. Confusing with other libraries: http://pages.cpsc.ucalgary.ca/~aycock/spark/
PyLR
It looks like a famous library that existed before 2000, but it doesn't exist now.
FlexModule, BisonModule
These also don't seem to exist now. This seems to have been a rapper for C's Flex and Bison.
re.Scanner (built-in)
It was a regular expression lexical analysis class, but it seems that it has disappeared from Python 3.4.

Summary,

--Parse if simple pattern matching is enough --PLY if you want to code left recursive grammar / write rules as an independent document and don't need complicated look-ahead --Pyparsing if you want to define rules by coding

Seems to be good.


Is it the strongest of ANTLR after all?

In addition, PLY only looks ahead for one token. If you want to write a complicated grammar, it is almost an option of ANTLR (Java tool) that uses LL (*) parsing. For example, ANTLR can easily parse the grammar of Coq, which is a support system for theorem proving. ANTLR v4 has the ability to generate a parser for Python 2/3. Also, ANTLR's Python wrapper seems to be on PyPI.

Recommended Posts

Python lexical / parsing library (2014.11 first survey, 2019.10 partial addition)
Python standard library: First half (Python learning memo ⑧)
Python 3.6 email library
First time python
Python ast library
First Python 3 ~ First comparison ~
First time python
Python Library notes
First Python ~ Coding 2 ~
First python [O'REILLY]
Use MeCab constrained parsing (partial parsing) in Python through natto-py