** "looks JavaScript, brain (contents) Ruby, (stability is AC / DC)" ** Scripting language Kinx ). This time is also a parser combinator.
Last time, I made an AST with Kinx Library-Parser Combinator (Part 1).
This time I will interpret this and execute it. Finally, JIT compiles and runs.
Let's reconfirm the last final program.
using Parsek;
function makeAST(first, rest) {
var expr = first;
for (var i = 0, l = rest.length(); i < l; ++i) {
expr = { lhs: expr, op: rest[i][0], rhs: rest[i][1] };
}
return expr;
}
var $ = new Parsek();
var number = $.regex(/[1-9][0-9]*|[0-9]/).map(Integer.parseInt);
var addsub = $.oneOf("+-");
var muldiv = $.oneOf("*/%");
var lbr = $.string("(");
var rbr = $.string(")");
var expression;
var factor = $.lazy(&() => $.alt(number, $.seq(lbr, expression, rbr).map(&(value) => value[1])));
var term = $.seqMap(factor, $.seq(muldiv, factor).many(), makeAST);
expression = $.seqMap(term, $.seq(addsub, term).many(), makeAST);
// test
System.println(expression.parseAll("1+2*3+2*(14-2)").value.toJsonString(true));
This. Let's fix just one problem with this. It's not a problem, though.
Actually, this parser will fail if there are whitespace characters. Of course, there is no place to interpret the space, so the character is not supposed to encounter the space, so he confidently answers, "There is no grammar!".
In ordinary programming languages, token breaks are any number of whitespace characters, and whitespace is generally something to skip. Therefore, let's skip the whitespace characters. There are two things to use:
$ .optWhitespace
is a parser that interprets zero or more arbitrary whitespace characters. Specifically, it is the same as $ .regex (/ \ s * /)
. Therefore, it matches whitespace, newlines (CR / LF), and tabs. There is also $ .whitespace
, which is$ .regex (/ \ s + /)
to match one or more whitespace characters.parser.skip (anotherParser)
to match parser
, then try to match ʻanotherParser, and if so, return the result of
parser. That is, it skips the match result of ʻanotherParser
.Using these two, you can use .skip ($ .optWhitespace)
to skip whitespace after a particular parser matching.
Here, prepare a function called lexeme ()
as shown below, and add a process to skip whitespace characters. Define ʻignore` so that you can add it here when there are more things to ignore, and you need to skip whitespace even at the first start.
var ignore = $.optWhitespace;
var lexeme = &(p) => p.skip(ignore);
This lexeme ()
function only needs to be at the token break, so it should be in the parser of the smallest unit of number
, ʻaddsub,
muldiv,
lbr,
rbr`. If you rewrite the first program, it will be as follows.
using Parsek;
function makeAST(first, rest) {
var expr = first;
for (var i = 0, l = rest.length(); i < l; ++i) {
expr = { lhs: expr, op: rest[i][0], rhs: rest[i][1] };
}
return expr;
}
var $ = new Parsek();
var ignore = $.optWhitespace;
var lexeme = &(p) => p.skip(ignore);
var number = lexeme($.regex(/[1-9][0-9]*|[0-9]/)).map(Integer.parseInt);
var addsub = lexeme($.oneOf("+-"));
var muldiv = lexeme($.oneOf("*/%"));
var lbr = lexeme($.string("("));
var rbr = lexeme($.string(")"));
var expression;
var factor = $.lazy(&() => $.alt(number, $.seq(lbr, expression, rbr).map(&(value) => value[1])));
var term = $.seqMap(factor, $.seq(muldiv, factor).many(), makeAST);
expression = $.seqMap(term, $.seq(addsub, term).many(), makeAST);
// test
System.println(expression.parseAll("1+2*3+2*(14-2)").value.toJsonString(true));
System.println(ignore.then(expression).parseAll(" 1 + 2 * 3 + 2 * ( 14 - 2 ) ").value.toJsonString(true));
/* Both results are the same.
{
"lhs": {
"lhs": 1,
"op": "+",
"rhs": {
"lhs": 2,
"op": "*",
"rhs": 3
}
},
"op": "+",
"rhs": {
"lhs": 2,
"op": "*",
"rhs": {
"lhs": 14,
"op": "-",
"rhs": 2
}
}
}
*/
Both return the same result.
The first ʻignore.then ()is the part that ignores leading whitespace.
parser.then (anotherParser) returns ʻanotherParser
, unlike .skip ()
. This will cause the parser
to be ignored.
Now let's get into the main subject.
First, let's implement an interpreter before JIT. This is easier. The interpreter runs sequentially while traversing the AST. Evaluate the left side, evaluate the right side, and calculate with the result. The ʻInterpreter` class looks like this:
class Interpreter(opts_) {
private sequence(r, op, lhs, rhs) {
return if (!opts_.enableSequence);
System.println("%d %s %d -> %d" % lhs % op % rhs % r);
}
public eval(ast) {
var lhs = ast.lhs.isObject ? eval(ast.lhs) : ast.lhs;
var rhs = ast.rhs.isObject ? eval(ast.rhs) : ast.rhs;
var r = 0;
switch (ast.op) {
case '+':
r = lhs + rhs;
break;
case '-':
r = lhs - rhs;
break;
case '*':
r = lhs * rhs;
break;
case '/':
r = lhs / rhs;
break;
case '%':
r = lhs % rhs;
break;
default:
throw RuntimeException('Invalid operator');
}
sequence(r, ast.op, lhs, rhs);
return r;
}
}
I also tried to make the calculation process visible (passing {enableSequence: true}
to the constructor will show the calculation process). Let's run it like this.
System.println(
"Result = ",
new Interpreter({ enableSequence: true })
.eval(ignore.then(expression)
.parseAll(" 1 + 2 * 3 + 2 * ( 14 - 2 ) ").value)
);
/*
2 * 3 -> 6
1 + 6 -> 7
14 - 2 -> 12
2 * 12 -> 24
7 + 24 -> 31
Result = 31
*/
It is calculated according to the priority! Not surprisingly, the AST is built according to priority. Let's try another formula.
System.println(
"Result = ",
new Interpreter({ enableSequence: true })
.eval(ignore.then(expression)
.parseAll("(( 123 ) * 2 * 4 - 3 * ( 12 + 10 )) % 100").value)
);
/*
123 * 2 -> 246
246 * 4 -> 984
12 + 10 -> 22
3 * 22 -> 66
984 - 66 -> 918
918 % 100 -> 18
Result = 18
*/
There is.
This is the long-awaited JIT. However, although the JIT technology itself is wonderful, honestly, with this level of computational complexity (no loops or conditional branching), the above interpreter is honestly faster. Since I am compiling with the same root as the interpreter and the interpreter itself is not doing much, the JIT compilation time is like the interpreter time (+ α). Hmmm.
It's just a technology base. I should have made an example that makes a difference in speed. Would you like to rewrite Brainf * ck as a subject, referring to the article here?
By the way, in ʻInterpreter, the calculation formula was executed directly, but it is OK if you rewrite it so that the equivalent machine language code is output instead of executing the calculation. Looking at the [JIT Library](https://qiita.com/Kray-G/items/a237f195630f3f0a91df), the
Compilerclass to convert to machine language code will be as follows. It's a little long, but I'll put it as it is. Again, you can output the intermediate format compile code by setting
{enableListing: true}` in the constructor.
class Compiler(opts_) {
var regs_, regsLen_;
enum { MOV, BOP, DIVMOD, RET }
private initialize() {
regs_ = [
// Jit.R0 and Jit.R1 is used as a work register when it is division.
{ reg: Jit.R2, used: false, name: "R2" },
{ reg: Jit.R3, used: false, name: "R3" },
{ reg: Jit.R4, used: false, name: "R4" },
{ reg: Jit.R5, used: false, name: "R5" },
{ reg: Jit.S0, used: false, name: "S0" },
{ reg: Jit.S1, used: false, name: "S1" },
{ reg: Jit.S2, used: false, name: "S2" },
{ reg: Jit.S3, used: false, name: "S3" },
{ reg: Jit.S4, used: false, name: "S4" },
{ reg: Jit.S5, used: false, name: "S5" },
];
regsLen_ = regs_.length();
}
private listing(type, op, dst, op1, op2) {
return if (!opts_.enableListing);
switch (type) {
case MOV:
System.println("%s <- %s" % dst % op1);
break;
case BOP:
System.println("%s <- %s %s %s" % dst % op1 % op % op2);
break;
case DIVMOD:
System.println("R0 <- %s" % op1);
System.println("R1 <- %s" % op2);
System.println("%s <- R0 %s R1" % dst % op);
break;
case RET:
System.println("ret %s" % dst);
break;
}
}
private getReg() {
for (var i = 0; i < regsLen_; ++i) {
if (!regs_[i].used) {
regs_[i].used = true;
return i;
}
}
throw RuntimeException("Not enough register");
}
private releaseReg(i) {
regs_[i].used = false;
}
private compileLeaf(c, leaf) {
var r = getReg();
c.mov(regs_[r].reg, Jit.IMM(leaf));
listing(MOV, null, regs_[r].name, leaf);
return r;
}
private compileNode(c, ast) {
var rl = ast.lhs.isObject ? compileNode(c, ast.lhs) : compileLeaf(c, ast.lhs);
var rr = ast.rhs.isObject ? compileNode(c, ast.rhs) : compileLeaf(c, ast.rhs);
var r = getReg();
switch (ast.op) {
case '+':
c.add(regs_[r].reg, regs_[rl].reg, regs_[rr].reg);
listing(BOP, ast.op, regs_[r].name, regs_[rl].name, regs_[rr].name);
break;
case '-':
c.sub(regs_[r].reg, regs_[rl].reg, regs_[rr].reg);
listing(BOP, ast.op, regs_[r].name, regs_[rl].name, regs_[rr].name);
break;
case '*':
c.mul(regs_[r].reg, regs_[rl].reg, regs_[rr].reg);
listing(BOP, ast.op, regs_[r].name, regs_[rl].name, regs_[rr].name);
break;
case '/':
c.mov(Jit.R0, regs_[rl].reg);
c.mov(Jit.R1, regs_[rr].reg);
c.sdiv();
c.mov(regs_[r].reg, Jit.R0);
listing(DIVMOD, ast.op, regs_[r].name, regs_[rl].name, regs_[rr].name);
break;
case '%':
c.mov(Jit.R0, regs_[rl].reg);
c.mov(Jit.R1, regs_[rr].reg);
c.sdivmod();
c.mov(regs_[r].reg, Jit.R1);
listing(DIVMOD, ast.op, regs_[r].name, regs_[rl].name, regs_[rr].name);
break;
default:
throw RuntimeException('Invalid operator');
}
releaseReg(rl);
releaseReg(rr);
return r;
}
public compile(ast) {
var c = new Jit.Compiler();
c.enter();
var r = compileNode(c, ast);
c.ret(regs_[r].reg);
listing(RET, null, regs_[r].name);
return c.generate();
}
public run(ast) {
var code = compile(ast);
return code.run();
}
}
Registers are available from R2
to R5
and S0
to S5
. Save R0
and R1
for division as they need to be used for division. The register allocation method is "Reserve when using and return when finished".
Note that the R
type registers may be destroyed after the function is called, but this time the function is not called, so it can be treated in the same way as the S
type. An error will occur if the available registers are used up, but many of the intermediate values are short-lived, so this should not be a problem. If the number of values to be saved increases, issue a code to save it to memory. Also, Kinx's JIT library can have a local variable area on the stack, so it's easiest to allocate it when you run out.
Considering register allocation in earnest, it is quite profound, but in the case of JIT, ** optimization does not take much time **, so this is enough. Because it's a waste of optimization time. In the case of AOT Compilation, the compilation time is not a concern, and the faster the execution time, the better the optimization, but in JIT, the compilation time is also within the execution time, so that is not the case.
Let's calculate it now. Don't forget ʻusing Jit;`.
using Parsek;
using Jit;
/*abridgement*/
System.println(
"Result = ",
new Compiler({ enableListing: true })
.run(ignore.then(expression)
.parseAll(" 1 + 2 * 3 + 2 * ( 14 - 2 ) ").value)
);
/*
R2 <- 1
R3 <- 2
R4 <- 3
R5 <- R3 * R4
R3 <- R2 + R5
R2 <- 2
R4 <- 14
R5 <- 2
S0 <- R4 - R5
R4 <- R2 * S0
R2 <- R3 + R4
ret R2
Result = 31
*/
There is. Let's try another example.
System.println(
"Result = ",
new Compiler({ enableListing: true })
.run(ignore.then(expression)
.parseAll("(( 123 ) * 2 * 4 - 3 * ( 12 + 10 )) % 100").value)
);
/*
R2 <- 123
R3 <- 2
R4 <- R2 * R3
R2 <- 4
R3 <- R4 * R2
R2 <- 3
R4 <- 12
R5 <- 10
S0 <- R4 + R5
R4 <- R2 * S0
R2 <- R3 - R4
R3 <- 100
R0 <- R2
R1 <- R3
R4 <- R0 % R1
ret R4
Result = 18
*/
There is also this.
You can also dump the actual machine language code, so let's try it.
var code = new Compiler()
.compile(ignore.then(expression)
.parseAll(" 1 + 2 * 3 + 2 * ( 14 - 2 ) ").value);
code.dump();
System.println(code.run());
Output on Linux.
0: 53 push rbx
1: 41 57 push r15
3: 41 56 push r14
5: 48 8b df mov rbx, rdi
8: 4c 8b fe mov r15, rsi
b: 4c 8b f2 mov r14, rdx
e: 48 83 ec 10 sub rsp, 0x10
12: 48 c7 c7 01 00 00 00 mov rdi, 0x1
19: 48 c7 c1 02 00 00 00 mov rcx, 0x2
20: 49 c7 c0 03 00 00 00 mov r8, 0x3
27: 49 89 cb mov r11, rcx
2a: 4d 0f af d8 imul r11, r8
2e: 4a 8d 0c 1f lea rcx, [rdi+r11]
32: 48 c7 c7 02 00 00 00 mov rdi, 0x2
39: 49 c7 c0 0e 00 00 00 mov r8, 0xe
40: 49 c7 c3 02 00 00 00 mov r11, 0x2
47: 4c 89 c3 mov rbx, r8
4a: 49 2b db sub rbx, r11
4d: 49 89 f8 mov r8, rdi
50: 4c 0f af c3 imul r8, rbx
54: 4a 8d 3c 01 lea rdi, [rcx+r8]
58: 48 89 f8 mov rax, rdi
5b: 48 83 c4 10 add rsp, 0x10
5f: 41 5e pop r14
61: 41 5f pop r15
63: 5b pop rbx
64: c3 ret
31
Let's read it roughly.
rdi
register on line 0x12rcx
register on line 0x19r8
register on line 0x20rcx * r8
into the r11
register on line 0x27-0x2ardi + r11
to the rcx
register on line 0x2erdi
register on line 0x32r8
register on line 0x39r8 --r11
into the rbx
register on line 0x40-0x4ardi * rbx
into the r8
register on line 0x4d-0x50rcx + r8
to the rdi
register on line 0x54rdi
to the rax
register on line 0x58The return value of the function is determined to be the rax
register on x64, so this will return the result.
Then in another example.
var code = new Compiler()
.compile(ignore.then(expression)
.parseAll("(( 123 ) * 2 * 4 - 3 * ( 12 + 10 )) % 100").value);
code.dump();
System.println(code.run());
This is also the output on Linux.
0: 53 push rbx
1: 41 57 push r15
3: 41 56 push r14
5: 48 8b df mov rbx, rdi
8: 4c 8b fe mov r15, rsi
b: 4c 8b f2 mov r14, rdx
e: 48 83 ec 10 sub rsp, 0x10
12: 48 c7 c7 7b 00 00 00 mov rdi, 0x7b
19: 48 c7 c1 02 00 00 00 mov rcx, 0x2
20: 49 89 f8 mov r8, rdi
23: 4c 0f af c1 imul r8, rcx
27: 48 c7 c7 04 00 00 00 mov rdi, 0x4
2e: 4c 89 c1 mov rcx, r8
31: 48 0f af cf imul rcx, rdi
35: 48 c7 c7 03 00 00 00 mov rdi, 0x3
3c: 49 c7 c0 0c 00 00 00 mov r8, 0xc
43: 49 c7 c3 0a 00 00 00 mov r11, 0xa
4a: 4b 8d 1c 18 lea rbx, [r8+r11]
4e: 49 89 f8 mov r8, rdi
51: 4c 0f af c3 imul r8, rbx
55: 48 89 cf mov rdi, rcx
58: 49 2b f8 sub rdi, r8
5b: 48 c7 c1 64 00 00 00 mov rcx, 0x64
62: 48 89 f8 mov rax, rdi
65: 48 89 ce mov rsi, rcx
68: 48 99 cqo
6a: 48 f7 fe idiv rsi
6d: 48 89 d6 mov rsi, rdx
70: 49 89 f0 mov r8, rsi
73: 4c 89 c0 mov rax, r8
76: 48 83 c4 10 add rsp, 0x10
7a: 41 5e pop r14
7c: 41 5f pop r15
7e: 5b pop rbx
7f: c3 ret
18
There are also calculation results!
Normally, scripting languages convert AST to its own intermediate format code (bytecode) and execute it. So is Kinx. However, what you are doing is not much different from the above. It's more like doing something similar to Compiler
and running it in your own ʻInterpreter. ʻIntrpreter
is not needed if it can be converted to native machine code. Normally, it deals with multiple length operations and class objects, and dynamically typed languages do not know the type until runtime, so it is only troublesome to compile it into machine language. In particular, if you want to support x64 (Windows / Linux) / ARM and so on, you have to make a compiler for each. So bytecoding is a best practice.
That said, JIT is doing its best for serious products. Kinx can also be partially JIT-enabled with native functions. Moreover, since it is realized within the range that can be expressed by the abstract assembler (SLJIT), it should be highly portable to various platforms.
In any case, JIT has always been a hot topic in language processing, so I thought that the JIT library of Kinx would be fun as a little toy. I praise myself. Moreover, since I worked hard this time to implement a parser combinator, I think I can easily create my own DSL. I think it's quite fun, but how about it?
The catchphrase for the Kinx JIT library is ** "feel free to JIT" **, isn't it?
So, next time.