Happy New Year. We look forward to working with you this year.
By the way, the scripting language Kinx delivered in ** "looks like JavaScript, brain (contents) is Ruby, (stability is AC/DC)" **, but please continue to favor this year as well. I'm happy.
This time is the first series of making language processing systems.
I made a parser combinator and a JIT library, so I want to make a language processing system. This may be the first and last time. I made a simple calculator before, but this is a series to try something a little more advanced.
I've hidden it until now, but I'm actually a shogi fan. Hanyu is a god. I also went to listen to the lecture.
A long time ago, Modan Shogi became popular. By the way, the term "modern" here does not mean "modern", but it is ** modern-yaki **, so be careful. Modern-yaki modern means ** "a lot" **. The spelling is also different.
Actually, I made it a long time ago and put it in the examples
folder. You can see the entire source below.
And, as I wrote below, I noticed in this article that I forgot to support real numbers in the compiled version. Maybe I'll fix it soon. Please note.
This time, I will explain this from the top, and I also use it as a memorandum of my own.
By the way, in order to achieve this, it is a secret that the JIT library is crafted so that putc
and putn
can be called as described below. In the future, I'd like to make it general-purpose so that C functions can be called, but that's the connection up to that point.
See here for specifications.
I will quote the sample from the above. In addition, I will explain the whole article in case the original article is gone.
Hello World by ModanShogi.
▲ 9 eight silver △ 9 two balls ▲ 6 four steps △ 6 six silver ▲ 6 steps △ same ball
▲ 6 seven steps △ 6 one ball ▲ 6 eight balls △ 6 three steps ▲ 6 nine balls △ 7 seven silver
▲ 7 Gokin △ 7 One ball ▲ 4 Eight silver △ 4 Two balls ▲ 9 Six and △ 9 Eight steps
▲ Same ball △ 6 Seven balls ▲ 9 Five gold △ 9 One ball ▲ 6 Three gold △ Same ball
▲ 6 eight gold △ 6 two balls ▲ 4 steps △ 4 three balls ▲ 5 five steps △ 5 three balls
[PLAYER] [COL] [ROW] [PIECE]
(Example: ▲ 2 Shikoku)putn
.* N
(eg * 1
).The ModanShogi program runs on a virtual machine like this:
register
There are 9 registers, and these registers are numbered from 1 to 9.
Each register can hold one real number (not only integers but also decimals) .
At the beginning of the program, each register has the same initial value as its register number (containing an integer from 1 to 9) .
Stack
One stack.
Real numbers can be pushed and popped onto this stack.
As an instruction, there are the following.
PIECE | Also known as COL ROW | meaning |
---|---|---|
When | mov X Y | X = Y |
Ayumu | add X Y | X += Y |
Money | sub X Y | X -= Y |
Silver | mul X Y | X *= Y |
Katsura | div X Y | X /= Y |
Incense | mod X Y | X %= Y |
Dragon | push X _ | Push X |
Horse | pop X _ | Pop to X |
ball | putc X _ | Output character with character code X |
king | putn X _ | Output X as a number |
Fly | jump_if X Y | If X is non-zero, jump to the label indicated by register Y |
Horn | jump_ifp X Y | If X is 0 or more, jump to the label indicated by register Y |
Well, here is the implementation. In general, the operation of the language processing system is as follows.
It's rough. In detail, there are a lot of keywords such as lexical analysis, semantic analysis, optimization, etc., but this time it is not necessary, so I will follow the above flow.
Now, let's first make a parser to meet the above specifications. Basically, I will explain in the order of ** Source **.
First, load the required libraries. There are two:
using Parsek;
using Jit;
Next, let's define constants such as each instruction of ModanShogi. It is also defined as a character string for code output (dump) (although it is not used).
enum {
label = 0,
mov,
add,
sub,
mul,
div,
mod,
push,
pop,
putc,
putn,
jump_if,
jump_ifp,
}
var opname = [
"label",
"mov",
"add",
"sub",
"mul",
"div",
"mod",
"push",
"pop",
"putc",
"putn",
"jump_if",
"jump_ifp",
];
Next is the converter. What is converted is that the value obtained by grammatical interpretation is converted into an internal representation. Make a combination of opcode and arguments.
class Converter {
var map_ = [
{ '▲': 0, '△': 1, '☗': 0, '☖': 1 },
{ '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, 'same': 0 },
{ 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'Five': 5, 'Six': 6, 'Seven': 7, 'Eight': 8, 'Nine': 9, ' ': 0 },
{ 'When': mov, 'Ayumu': add, 'Money': sub, 'Silver': mul, 'Katsura': div, 'Incense': mod,
'Dragon': push, 'Horse': pop, 'ball': putc, 'king': putn, 'Fly': jump_if, 'Horn': jump_ifp },
];
var prev_;
public convert(stmt) {
# System.println(stmt);
var r = stmt.map { => map_[_2][_1] };
if (r[1] == 0) r[1] = prev_[1];
if (r[2] == 0) r[2] = prev_[2];
prev_ = r;
return {
r1: r[1],
r2: r[2],
op: r[3],
};
}
}
In order to handle same
, the previous state (prev_
) is retained.
Well, this is the long-awaited grammar definition. The parser combinator is easy because it includes the lexical analysis part in a sense.
Only two things are important, the others are skipped.
[PLAYER] [COL] [ROW] [PIECE]
... Operation* N
... labelFirst, from the point of skipping. Basically, the elements up to the first character are skipped, so you can skip anything other than "* ▲ △ ☗☖
".
var conv = new Converter();
var $ = new Parsek();
var ignore = $.noneOf("*▲△☗☖").many();
var lexeme = &(p) => p.skip(ignore);
Next, let's write the syntax that represents the operation. It's like this. Although it was not written in the explanation, it seems that it is necessary to be able to accept expressions such as "same money left" and "upper right left straight" (it was written somewhere, but I could not find it ...).
# [PLAYER][COL][ROW][PIECE]
# (Example: ▲ 2 Shikoku)
var player = $.oneOf('▲△☗☖');
var col = $.oneOf('123456789 Same as above');
var row = $.oneOf('one two three four five six seven eight nine');
var piece = $.oneOf('Ayaka Katsura Silver Gold Ball King Hikaku Ryoma');
var addc = $.oneOf("Top right left straight");
Next is the label. The label has label
as the opcode and keeps the specified number as address
.
It's hard to understand now, but address
is the label number.
var number = $.regex(/[1-9][0-9]*|[0-9]/).map(Integer.parseInt);
var labelp = lexeme($.seq($.string('*'), number)).map { => { op: label, address: _1[1] } };
Since it is only necessary that either the operation or the label appears, if it is a program, the program can be expressed as follows.
var stmt = lexeme($.seq(player, col, row, piece)).map { => conv.convert(_1) };
var program = $.alt(stmt, labelp).many();
The above program
expresses the following.
stmt
) is handed over to the converter created earlier and converted to an internal representation.stmt
or labelp
) is repeated 0 or more times.Now, let's create an interpreter. Represent the VM according to the previous rule. However, only the jump destination address is resolved first. It is as follows.
stack
) stack.reg
).class Interpreter(opts_) {
private setJumpAddress(code) {
var address = [];
for (var i = 0, l = code.length(); i < l; ++i) {
if (code[i].op == label) {
address[code[i].address] = i;
}
}
return address;
}
public eval(code) {
var stack = [];
var reg = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var address = setJumpAddress(code);
for (var i = 0, l = code.length(); i < l; ++i) {
var { op, r1, r2 } = code[i];
# System.println([opname[op], r1, r2]);
switch (op) {
case label: # nop
break;
case mov:
reg[r1] = reg[r2];
break;
case add:
reg[r1] += reg[r2];
break;
case sub:
reg[r1] -= reg[r2];
break;
case mul:
reg[r1] *= reg[r2];
break;
case div:
reg[r1] /= reg[r2];
break;
case mod:
reg[r1] %= reg[r2];
break;
case push:
stack.push(reg[r1]);
break;
case pop:
reg[r1] = stack.pop();
break;
case putc:
if (!opts_.disableDisplay) {
System.print(*Integer.parseInt(reg[r1]));
}
break;
case putn:
if (!opts_.disableDisplay) {
System.print(reg[r1]);
}
break;
case jump_if:
if (reg[r1]) {
i = address[reg[r2]];
}
break;
case jump_ifp:
# System.println("check = ", reg[r1]);
if (reg[r1] >= 0) {
i = address[reg[r2]];
}
break;
}
}
}
}
I'll repost the rules I mentioned earlier, so if you check that they match your implementation, you'll understand (although PIECE
has already disappeared on the interpreter).
PIECE | Also known as COL ROW | meaning |
---|---|---|
When | mov X Y | X = Y |
Ayumu | add X Y | X += Y |
Money | sub X Y | X -= Y |
Silver | mul X Y | X *= Y |
Katsura | div X Y | X /= Y |
Incense | mod X Y | X %= Y |
Dragon | push X _ | Push X |
Horse | pop X _ | Pop to X |
ball | putc X _ | Output character with character code X |
king | putn X _ | Output X as a number |
Fly | jump_if X Y | If X is non-zero, jump to the label indicated by register Y |
Horn | jump_ifp X Y | If X is 0 or more, jump to the label indicated by register Y |
Next is the implementation of the compiler.
As I wrote this article, I noticed that it doesn't support real numbers. Let's fix it later ...
The compiler compiles to native code in the Jit library. In the same way, the address resolution of the jump destination is done first. It's more complicated than an interpreter. Especially the jump address solution is a bit tricky. This is because the jump destination is not known at the time of compilation because it is an indirect jump based on the register value according to the language specifications. The jump destination according to the label number is stored in advance in the form of a table, and the table is referenced and jumped at runtime.
Compiler
) and create a function entrance (enter
).Jit.VAR (n)
is a variable area.makeConst
returns an object that holds the address for the nth variable area indicated byJit.VAR (n)
.setLabel (label)
is used to insert the address of the label itself (already resolved after code generation) into that address.reg
). Since the S5 register is used as a stack pointer, the registers that can be used are S0 to S4. That's not enough, so I try to use some memory (variable area).max + 5
) in the S5 register of the JIT library (the term is duplicated and confusing).max
is the jump destination storage area.max + 4
is the register area for the VM that ModanShogi lacked.max + 5
, so add another 1000.class Compiler(opts_) {
private setJumpAddress(code, c) {
var address = [], max = 0;
for (var i = 0, l = code.length(); i < l; ++i) {
if (code[i].op == label) {
address[i] = c.makeConst(Jit.VAR(code[i].address));
max = code[i].address if (max < code[i].address);
}
}
return [address, max];
}
public compile(code) {
var c = new Jit.Compiler();
c.enter();
var jt;
var labelTarget = [];
var jumpTarget = [];
var [address, max] = setJumpAddress(code, c);
var reg = [-1, Jit.VAR(max+1), Jit.VAR(max+2), Jit.VAR(max+3), Jit.VAR(max+4), Jit.S0, Jit.S1, Jit.S2, Jit.S3, Jit.S4];
for (var i = 1, l = reg.length(); i < l; ++i) {
c.mov(reg[i], Jit.IMM(i));
}
c.mov(Jit.S5, Jit.IMM(max+5));
c.mov(Jit.VAR(max+1005), Jit.IMM(0));
for (var i = 0, l = code.length(); i < l; ++i) {
var { op, r1, r2 } = code[i];
# System.println([opname[op], r1, r2]);
switch (op) {
case label:
labelTarget[i] = c.label();
break;
case mov:
c.mov(reg[r1], reg[r2]);
break;
case add:
c.add(reg[r1], reg[r1], reg[r2]);
break;
case sub:
c.sub(reg[r1], reg[r1], reg[r2]);
break;
case mul:
c.mul(reg[r1], reg[r1], reg[r2]);
break;
case div:
c.mov(Jit.R0, reg[r1]);
c.mov(Jit.R1, reg[r2]);
c.div();
c.mov(reg[r1], Jit.R0);
break;
case mod:
c.mov(Jit.R0, reg[r1]);
c.mov(Jit.R1, reg[r2]);
c.divmod();
c.mov(reg[r1], Jit.R1);
break;
case push:
c.mov(Jit.R1, Jit.S5);
c.localp(Jit.R0, Jit.R1);
c.mov(Jit.MEM1(Jit.R0), reg[r1]);
c.add(Jit.S5, Jit.S5, Jit.IMM(1));
break;
case pop:
c.sub(Jit.S5, Jit.S5, Jit.IMM(1));
c.mov(Jit.R1, Jit.S5);
c.localp(Jit.R0, Jit.R1);
c.mov(reg[r1], Jit.MEM1(Jit.R0));
break;
case putc:
if (!opts_.disableDisplay) {
c.mov(Jit.R0, reg[r1]);
c.icall(Jit.IMM(Jit.Clib.putc));
}
break;
case putn:
if (!opts_.disableDisplay) {
c.mov(Jit.R0, reg[r1]);
c.icall(Jit.IMM(Jit.Clib.putn));
}
break;
case jump_if:
jt = c.eq(reg[r1], Jit.IMM(0));
c.mov(Jit.R1, reg[r2]);
c.localp(Jit.R0, Jit.R1);
c.mov(Jit.R0, Jit.MEM1(Jit.R0));
c.ijmp(Jit.R0);
jt.setLabel(c.label());
break;
case jump_ifp:
jt = c.slt(reg[r1], Jit.IMM(0));
c.mov(Jit.R1, reg[r2]);
c.localp(Jit.R0, Jit.R1);
c.mov(Jit.R0, Jit.MEM1(Jit.R0));
c.ijmp(Jit.R0);
jt.setLabel(c.label());
break;
}
}
c.ret(Jit.R0);
var code = c.generate();
address.each {
if (_1) _1.setLabel(labelTarget[_2]);
};
return code;
}
}
Now that we have the interpreter and compiler, let's write the part that actually parses and executes the source code. Ignore the comment out of the sample and load the following game record.
var kifu =
# Hello, world!
%{
▲ 9 eight silver △ 9 two balls ▲ 6 four steps △ 6 six silver ▲ 6 steps △ same ball
▲ 6 seven steps △ 6 one ball ▲ 6 eight balls △ 6 three steps ▲ 6 nine balls △ 7 seven silver
▲ 7 Gokin △ 7 One ball ▲ 4 Eight silver △ 4 Two balls ▲ 9 Six and △ 9 Eight steps
▲ Same ball △ 6 Seven balls ▲ 9 Five gold △ 9 One ball ▲ 6 Three gold △ Same ball
▲ 6 eight gold △ 6 two balls ▲ 4 steps △ 4 three balls ▲ 5 five steps △ 5 three balls
};
Below is the measurement of the execution time using the timer and the display of the result.
var t, tmr = new SystemTimer();
First, parsing.
# Parsing the code
tmr.restart();
var r = ignore.then(program).parseAll(kifu);
t.parse = tmr.elapsed();
Then run it in the interpreter.
# Interpret it
var interp = new Interpreter();
tmr.restart();
interp.eval(r.value);
t.interpret.total = tmr.elapsed();
Finally run with the compiler. It is divided into a compile phase and an execution phase.
# Compile it
var compiler = new Compiler();
tmr.restart();
var code = compiler.compile(r.value);
t.jit.compile = tmr.elapsed();
# Execute it
tmr.restart();
code.run();
t.jit.execute = tmr.elapsed();
t.jit.total = t.jit.compile + t.jit.execute;
And the result display.
System.println("Result = ", t.toJsonString(true));
Benchmark results. For now, it's Linux.
Hello, world!
Hello, world!
Result = {
"interpret": {
"total": 0.001682
},
"jit": {
"compile": 0.000799,
"execute": 0.00014,
"total": 0.000939
},
"parse": 0.008555
}
You've got Hello, world!
Twice!
What the JSON result represents is as follows.
In other words, since the parsing process is common, when comparing the interpreter and the compiler, the compiler has faster execution time as shown below!
Execution form | Parsing process | Run | Overall |
---|---|---|---|
Interpreter | 8.5 ms | 1.7 ms | 10.2 ms |
compiler | 8.5 ms | (0.8+0.1) 0.9 ms | 9.4 ms |
The subject is that, but as I wrote earlier
I think we can somehow grasp the flow.
Let's enjoy the fun language implementation life!
I need to be able to handle real numbers in the compiler.
see you.
Recommended Posts