** "Looks like JavaScript, brain (contents) is Ruby, (stability is AC / DC)" ** Scripting language Kinx ). This time is also a parser combinator.
Not long ago, I ran it with Kinx Library-Parser Combinator (Part 2). What is this time?
Yes, I would like to introduce the interface. I'm sorry, I forgot to do it last time.
Parsek
Instances of the Parsek
class have the following methods: The $
in the statement indicates the actual Parsek instance that was instantiated.
The following predefined parsers already exist in the instance of the Parsek
class.
Predefined parser | Contents |
---|---|
$.eof | A parser that succeeds if it is EOF. |
$.any | A parser that succeeds with any single character. |
$.all | A parser that processes everything until the end and succeeds. |
$.index | Returns the current position ({ offset, line, column} ). offset is 0 starting point, line/column is one starting point. |
$.letter | $.regex(/[a-zA-Z]/) Same as. |
$.letters | $.regex(/[a-zA-Z]*/) Same as. |
$.digit | $.regex(/[0-9]/) Same as. |
$.digits | $.regex(/[0-9]*/) Same as. |
$.whitespace | $.regex(/\s+/) Same as. |
$.optWhitespace | $.regex(/\s*/) Same as. |
$.cr | $.string("\r") Same as. |
$.lf | $.string("\n") Same as. |
$.crlf | $.string("\r\n") Same as. |
$.newline | $.alt($.crlf, $.lf, $.cr) Same as. |
$.end | $.alt($.newline, $.eof) Same as. |
An instance of the Parsek
class has the ability to generate a parser with the following methods:
Parser generation function | Contents |
---|---|
$.string(str) | str Creates a parser that succeeds if it matches the string specified in. |
$.regex(re, groupIndex) | re Creates a parser that succeeds if it matches the regular expression specified in. value isgroupIndex Contains the capture result of the group specified in. |
$.succeed(result) | result To create a parser that always succeeds. |
$.fail(message) | message Create a parser that fails by leaving a message. |
$.oneOf(chars) | chars Create a parser that succeeds if it matches one character contained in. |
$.noneOf(chars) | chars Create a parser that succeeds if it does not match one character contained in. |
$.sepBy(content, separator) | content A string that can be parsed withseparator Create a parser that returns the result divided by the result that matches. |
$.sepBy1(content, separator) | sepBy Is allowed 0,sepBy1 Fails if at least one does not match. |
$.lazy(description, f) | f Specifies a function that returns a parser to, and creates a parser when it is first used. |
$.alt(...parsers) | Creates a parser that succeeds if it matches any of the specified parsers. The application is done in the order specified. |
$.takeWhile(predicate) | The next character ispredicate Create a parser that will continue to succeed while it is true. |
$.seq(...parsers) | Creates a parser that succeeds when multiple specified parsers are matched in sequence. Results are returned as an array of all results. |
$.seqMap(...parsers, mapf) | Creates a parser that succeeds when multiple specified parsers are matched in sequence. The result is an array of all the results, but it returns the value returned by the function specified by mapf. |
ParsekParser
The parser generated by the method of the Parsek
class is an instance of the ParsekParser
class. This class defines methods for chaining parsers.
Once all the parsers are ready, run the parse with the following method.
Method | Contents |
---|---|
parser.parseAll(target) | target Againstparser To parse using. |
Parser generation function | Contents |
---|---|
parser.desc(description) | parser Set the message to be displayed when parsing by is unsuccessful. |
parser.or(nextParser) | parser If parsing bynextParser Create a parser that attempts to. |
parser.and(nextParser) | parser If you succeed in parsing bynextParser Create a parser to execute. The result is returned as an array. |
parser.then(nextParser) | parser If you succeed in parsing bynextParser Create a parser to execute. Result isnextParser Return the oneparser The result of is ignored. |
parser.skip(nextParser) | parser If you succeed in parsing bynextParser Create a parser to execute. Result isparser Return the onenextParser The result of is ignored. |
parser.many() | parser Create a parser that parses the occurrence of 0 or more times. |
parser.many1() | parser Create a parser that parses that appears more than once. |
parser.times(min, max) | parser Butmin More than oncemax Create a parser that parses the appearance repeatedly no more than once. ..max If is omitted, justmin Success in case of number of times. |
parser.sepBy(separator) | parser A string that can be parsed withseparator Create a parser that returns the result divided by the result that matches. |
parser.sepBy1(separator) | sepBy Is allowed 0,sepBy1 Fails if at least one does not match. |
parser.map(f) | parser Use the result of the above converted by the function f. |
parser.chain(f) | parser Pass the result of to the function f, and connect the parsers to use the new parser returned by f as the next parser. |
parser./(nextParser) | parser.or(nextParser) Another name for. |
parser.+(nextParser) | nextParser Without specifyingparser.+() When used asparser.many1() Another name for.nextParser If you specifyparser.and(nextParser) Another name for. |
parser.*() | parser.many() Another name for. |
Packrat Parsing
Supports Packrat Parsing.
Parser combinators usually proceed with analysis in the backtrack. So, for example, in the case of $ .alt (a, b, c)
, if the parser of ʻa fails, you can proceed with
b`, but the same parser may be repeated many times along the way.
Therefore, when backtracking, performance improvement can be expected by caching some results and reusing them by using the property that ** if analyzed in the same place and under the same conditions, the same result will be obtained **. .. It's memoization.
Parsek
supports this feature. By the way, it can be disabled by specifying {disablePackratParsing: true}
in the constructor when creating a Parsek
instance.
Let's see its performance with a benchmark. Refer to here to handle the grammar that causes a tremendous number of backtracks. The grammar is as follows.
S <- A
A <- P "+" A / P "-" A / P
P <- "(" A ")" / "1"
This will parse the following string:
(((((((((((((1)))))))))))))
The explanation is easy to understand, so I will quote it as it is.
The essence of this rule and string combination is that it must go through all A and P combinations before being finally parsed. In other words, the worst case of backtrack can be almost reproduced. So if you frankly backtrack, the processing time should increase exponentially with the number of parentheses correspondence.
No, it's amazing.
Now, the benchmark source code. Since the parser itself is the same, the same test is done with the $ D
parser of{disablePackratParsing: true}
and the $ E
parser enabled.
using Parsek;
var $E = new Parsek();
var $D = new Parsek({ disablePackratParsing: true });
var tmr = new SystemTimer();
// rule:
// S <- A
// A <- P "+" A / P "-" A / P
// P <- "(" A ")" / "1"
// input:
// (((((((((((((1)))))))))))))
function generateParser($) {
var S, A, P;
S = $.lazy(&() => A);
A = $.lazy(&() => $.alt($.seq(P, $.string('+'), A), $.seq(P, $.string('-'), A), P));
P = $.alt($.seq($.string('('), A, $.string(')')), $.string('1'));
return S;
}
function test($) {
for (var i = 0; i <= 10; ++i) {
tmr.restart();
var r = generateParser($).parseAll(('(' * i) + '1' + (')' * i));
var elapsed = tmr.elapsed();
System.println(["%8.5f" % elapsed, r]);
}
}
test($E);
test($D);
Let's do it!
If valid
[" 0.00090", {"position":1,"status":1,"value":"1"}]
[" 0.00041", {"position":3,"status":1,"value":["(","1",")"]}]
[" 0.00044", {"position":5,"status":1,"value":["(",["(","1",")"],")"]}]
[" 0.00061", {"position":7,"status":1,"value":["(",["(",["(","1",")"],")"],")"]}]
[" 0.00083", {"position":9,"status":1,"value":["(",["(",["(",["(","1",")"],")"],")"],")"]}]
[" 0.00055", {"position":11,"status":1,"value":["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"]}]
[" 0.00061", {"position":13,"status":1,"value":["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"]}]
[" 0.00071", {"position":15,"status":1,"value":["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"]}]
[" 0.00200", {"position":17,"status":1,"value":["(",["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"],")"]}]
[" 0.00101", {"position":19,"status":1,"value":["(",["(",["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"],")"],")"]}]
[" 0.00196", {"position":21,"status":1,"value":["(",["(",["(",["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"],")"],")"],")"]}]
When disabled
[" 0.00022", {"position":1,"status":1,"value":"1"}]
[" 0.00034", {"position":3,"status":1,"value":["(","1",")"]}]
[" 0.00097", {"position":5,"status":1,"value":["(",["(","1",")"],")"]}]
[" 0.00353", {"position":7,"status":1,"value":["(",["(",["(","1",")"],")"],")"]}]
[" 0.01142", {"position":9,"status":1,"value":["(",["(",["(",["(","1",")"],")"],")"],")"]}]
[" 0.02686", {"position":11,"status":1,"value":["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"]}]
[" 0.09525", {"position":13,"status":1,"value":["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"]}]
[" 0.26821", {"position":15,"status":1,"value":["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"]}]
[" 0.72229", {"position":17,"status":1,"value":["(",["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"],")"]}]
[" 2.44629", {"position":19,"status":1,"value":["(",["(",["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"],")"],")"]}]
[" 7.48334", {"position":21,"status":1,"value":["(",["(",["(",["(",["(",["(",["(",["(",["(",["(","1",")"],")"],")"],")"],")"],")"],")"],")"],")"],")"]}]
If Packrat Parsing is enabled, 10 parentheses will complete within 2 milliseconds. On the other hand, when disabled, it takes 7.5 seconds when the last 10 parentheses are reached.
Parser combinator, it's convenient to try this. I haven't used it much, but I think I'll use it a little more in the future. You can easily parse strings that cannot be expressed by regular expressions.
You can use it to create a language processing system. Kinx uses yacc for parsing and handwriting for lexical analysis.
Originally, I like parsing itself and handwritten recursive descent parser, but I use yacc exclusively because it is difficult to change the grammar definition. However, the parser combinator can also perform lexical analysis, which is convenient.
see you!