** "Looks like JavaScript, brain (contents) is Ruby, (stability is AC / DC)" ** Scripting language Kinx ). I made a library for JIT compilation.
I want to do JIT. This time, we made SLJIT, which is also used in Kinx --native, into a library to make it easier to use. Since SLJIT itself has few documents and is used by deciphering it from the source, I thought about writing how to use SLJIT itself as a memorandum, but this time it is reserved. I might do it somewhere.
However, of course it is easier to use than using SLJIT as it is, so I think this is better. ** The host language is also a script **, so you can enjoy it easily.
First, I will give you a sample of what the program will look like. It seems that various details will continue and it will not reach this point. .. ..
using Jit;
var c = new Jit.Compiler();
var entry1 = c.enter();
var jump0 = c.ge(Jit.S0, Jit.IMM(3));
c.ret(Jit.S0);
var l1 = c.label();
c.sub(Jit.R0, Jit.S0, Jit.IMM(2));
c.call(entry1);
c.mov(Jit.S1, Jit.R0);
c.sub(Jit.R0, Jit.S0, Jit.IMM(1));
c.call(entry1);
c.add(Jit.R0, Jit.R0, Jit.S1);
c.ret(Jit.R0);
jump0.setLabel(l1);
var code = c.generate();
for (var i = 1; i <= 42; ++i) {
var tmr = new SystemTimer();
var r = code.run(i);
System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}
Create a Jit.Compiler
object, create a function entry with ʻenter, and write code to play with various registers and
ret. So, when you execute it, it becomes
generate ()and
run (). You can also see the assemble list by doing
generate ()and
dump ()`.
If you want to skip various things, go to Sample! → In the sample, we also benchmark with Ruby, Python, and PyPy.
SLJIT
What is SLJIT in the first place?
In a nutshell, ** Abstraction Assembler ** is a library that solves the problem of the assembler that one writing style can support multiple environments, which is different for each CPU and must be recreated. The platforms that are currently supported are as follows.
However, please note that the Kinx version of the JIT library introduced here only supports 64bit, and we have only confirmed (made) x64 Windows and x64 Linux.
As far as I know, I found only the following helpful documents.
It will be helpful.
The GitHub repository is below.
Jit
Now, the JIT library as a Kinx library. It is more convenient than using it as C. Of course, you can use the C library for more control, but you can do it.
using Jit
The Jit library is not built-in, so use the using directive to load it explicitly.
using Jit;
The Jit object defines methods for parameters and compiler classes.
There are three types of Jit parameters: immediate value, register, and memory access. It is used in the following form.
Immediate value and memory access are used in the following methods. Jit.VAR ()
is a special method for using local variable regions. A local variable area is automatically allocated in the stack area, and that area is used.
Method | Remarks |
---|---|
Jit.IMM(v) |
Write the same way for both 64-bit integers and floating-point numbers. Match with the register of the assignment destination. |
Jit.VAR(n) |
Local variable area. 1 variable fixed to 8 bytes. |
Jit.MEM0(address) |
Substitute an immediate value as address, but cannot be used from a script because the current real address cannot be specified from the script. |
Jit.MEM1(r1, offset) |
The register specified in r1 is regarded as an address, and the memory address of the offset position (in bytes) is shown. |
Jit.MEM2(r1, r2, shift) |
shift is 0 for 1 byte, 1 for 2 bytes, 2 for 4 bytes, 3 for 8 bytes,r1 + r2 * (Bytes indicated by shift) Indicates the memory address at the location of. |
The following registers can be used. The number of registers that can be used in a function is calculated automatically and changes for each function (range separated by ʻenter ()`).
register | Use |
---|---|
Jit.R0 ~ Jit.R5 |
General-purpose register. Temporarily used. It may be discarded after calling another function. |
Jit.S0 ~ Jit.S5 |
General-purpose register. Guarantee that it will not be destroyed after calling another function. |
Jit.FR0 ~ Jit.FR5 |
Floating point register. Temporarily used. It may be discarded after calling another function. |
Jit.FS0 ~ Jit.FS5 |
Floating point register. Guarantee that it will not be destroyed after calling another function. |
Since there are a maximum of 6 registers for Floating Point in total for FR
/ FS
, only FS0
can be used when using up to FR4
. If you use up to FR5
, you cannot use allFS *
. Please note that it looks like the following.
FR* register |
FS* register |
---|---|
(Not available) | FS0 , FS1 , FS2 , FS3 , FS4 , FS5 |
FR0 |
FS0 , FS1 , FS2 , FS3 , FS4 |
FR0 , FR1 |
FS0 , FS1 , FS2 , FS3 |
FR0 , FR1 , FR2 |
FS0 , FS1 , FS2 |
FR0 , FR1 , FR2 , FR3 |
FS0 , FS1 |
FR0 , FR1 , FR2 , FR3 , FR4 |
FS0 |
FR0 , FR1 , FR2 , FR3 , FR4 , FR5 |
(Not available) |
To create a Jit instruction, create a Jit compiler object.
var c = new Jit.Compiler();
The Jit compiler has the following methods.
Jit compiler method | Return value | Overview |
---|---|---|
Jit.Compiler#label() |
label | Add a label to the current location. |
Jit.Compiler#makeConst(reg, init) |
ConstTarget | Outputs a tentative definition code for setting the immediate value after code generation. |
Jit.Compiler#localp(dst, offset) |
Output the code to get the real address of the local variable.dst It is stored in the register shown in. offset is the local variable number. |
|
Jit.Compiler#enter(argType) |
label | Create a function entrance. Argument type can be specified (optional). |
Jit.Compiler#fastEnter(reg) |
label | Create a function entrance. However, no extra epilogue or prologue is output, and the return address is output.reg Save to. |
Jit.Compiler#ret(val) |
Output the Return code.val return it.val Is a floating point numberFR0 Registers, otherwiseR0 Returned at the register. |
|
Jit.Compiler#f2i(dst, op1) |
double int64_Output the code to cast to t.dst Is a general-purpose register.op1 Is a floating point register. |
|
Jit.Compiler#i2f(dst, op1) |
int64_Output the code that casts t to a double.dst Is a floating point register.op1 Is a general-purpose register. |
|
Jit.Compiler#mov(dst, op1) |
dst Toop1 Output the code to substitute for. Floating point and other types are automatically recognized. |
|
Jit.Compiler#neg(dst, op1) |
op1 The result of sign inversion ofdst Output the code to be stored in. |
|
Jit.Compiler#clz(dst, op1) |
op1 Count the number of bits that are 0 from the beginning ofdst Output the code to be stored in. |
|
Jit.Compiler#add(dst, op1, op2) |
op1 Whenop2 The result of addingdst Output the code to be stored in. |
|
Jit.Compiler#sub(dst, op1, op2) |
op1 Whenop2 The result of subtractingdst Output the code to be stored in. |
|
Jit.Compiler#mul(dst, op1, op2) |
op1 Whenop2 The result of multiplying bydst Output the code to be stored in. |
|
Jit.Compiler#div(dst, op1, op2) |
Floating point numbers only,op1 Whenop2 The result of dividingdst Output the code to be stored in. |
|
Jit.Compiler#div() |
The value divided by the general-purpose register as unsignedR0 Output the code to be stored in the register. |
|
Jit.Compiler#sdiv() |
The value divided by the general-purpose register as signedR0 Output the code to be stored in the register. |
|
Jit.Compiler#divmod() |
The value divided by the general-purpose register as unsignedR0 Store in a register and leave the remainderR1 Output the code to be stored in the register. |
|
Jit.Compiler#sdivmod() |
The value divided by the general-purpose register as signedR0 Store in a register and leave the remainderR1 Output the code to be stored in the register. |
|
Jit.Compiler#not(dst, op1) |
op1 The result of bit inversion ofdst Output the code to be stored in. |
|
Jit.Compiler#and(dst, op1, op2) |
op1 Whenop2 Bit AND value withdst Output the code to be stored in. |
|
Jit.Compiler#or(dst, op1, op2) |
op1 Whenop2 Bit OR value indst Output the code to be stored in. |
|
Jit.Compiler#xor(dst, op1, op2) |
op1 Whenop2 Bit XORed bydst Output the code to be stored in. |
|
Jit.Compiler#shl(dst, op1, op2) |
op1 Toop2 The value shifted to the left by the bitdst Output the code to be stored in. |
|
Jit.Compiler#lshr(dst, op1, op2) |
op1 Toop2 The value shifted logically to the right by the bitdst Output the code to be stored in. |
|
Jit.Compiler#ashr(dst, op1, op2) |
op1 Toop2 Bits, arithmetic right-shifted valuesdst Output the code to be stored in. |
|
Jit.Compiler#call(label) |
JumpTarget | enter() Output the code that makes the defined function call. Returns a JumpTarget that later sets the callee.label If is specified, there is no need to set it later. |
Jit.Compiler#fastCall(label) |
JumpTarget | fastEnter() Output the code that calls the function defined in. Returns a JumpTarget that later sets the callee. |
Jit.Compiler#jmp(label) |
JumpTarget | jmp Output the command.label If is specified, there is no need to set it later. |
Jit.Compiler#ijmp(dst) |
JumpTarget | jmp Output the command.dst Is a register indicating the address, or an immediate value. |
Jit.Compiler#eq(op1, op2) |
JumpTarget | op1 == op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#neq(op1, op2) |
JumpTarget | op1 != op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#lt(op1, op2) |
JumpTarget | As unsignedop1 < op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#le(op1, op2) |
JumpTarget | As unsignedop1 <= op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#gt(op1, op2) |
JumpTarget | As unsignedop1 > op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#ge(op1, op2) |
JumpTarget | As unsignedop1 >= op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#slt(op1, op2) |
JumpTarget | As signedop1 < op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#sle(op1, op2) |
JumpTarget | As signedop1 <= op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#sgt(op1, op2) |
JumpTarget | As signedop1 > op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#sge(op1, op2) |
JumpTarget | As signedop1 >= op2 Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true. |
Jit.Compiler#generate() |
JitCode | Generate code. |
Jit.Compiler#enter(argType)
The entrance of the function is defined by the ʻenter method, but if ʻargType
is not specified, it is considered that Jit.ArgType.SW_SW_SW
is specified. Up to 3 arguments (specification) specify each type.
SW
... Signed Word (64bit)UW
... Unsigned Word (64bit)FP
... Floating Point (64bit)As a matter of fact, SW
and ʻUWdo not change because the received register bits are the same, but it may make some difference in the future. Note that
SW` can be omitted from the last argument. So the following all have the same meaning.
Jit.ArgType.SW_SW_SW
Jit.ArgType.SW_SW
Jit.ArgType.SW
The register passed as an argument is fixed and is as follows.
Mold | 1st argument | 2nd argument | 3rd argument |
---|---|---|---|
Integer | Jit.R0 |
Jit.R1 |
Jit.R2 |
Double | Jit.FR0 |
Jit.FR1 |
Jit.FR2 |
Mold | 1st argument | 2nd argument | 3rd argument |
---|---|---|---|
Integer | Jit.S0 |
Jit.S1 |
Jit.S2 |
Double | Jit.FS0 |
Jit.FS1 |
Jit.FS2 |
Note that the register set by the caller and the register received by the receiver are different.
ConstTarget
Set the label address with setLabel ()
.
It is used when you want to store the label address as an immediate value in a register or memory. Do you have a lot of opportunities to use it? I think it could be a substitute for a jump table, but I haven't prepared a good mechanism for making a table.
By the way, you can set the immediate value with setValue ()
, but I made it possible to normally use Jit.IMM (100)
or even floating point numbers such as Jit.IMM (0.1)
. There is not much point in using it.
An example of using it for a jump table will be described later.
JumpTarget
Set the jump destination or the address for the function call with setLabel ()
.
For example, when branching based on the comparison result, it becomes as follows.
var c = new Jit.Compiler();
//Function entry point.
c.enter();
//S0 register value>= 3
var jump0 = c.ge(Jit.S0, Jit.IMM(3));
... //Code when the condition is false
var jump1 = c.jmp();
var label0 = c.label();
... //Code when the condition is true
var label1 = c.label();
...
jump0.setLabel(label0);
jump1.setLabel(label1);
JitCode
If the code generation is successful with the generate ()
method, a JitCode object is returned. The methods of the JitCode object are as follows. Note that you can only specify up to 3 arguments (specification). Since it is an abstraction assembler, it is a specification required to support various architectures. If necessary, it is necessary to secure a local variable area and pass the start address of the local variable area. A sample will be described later.
Method | Overview |
---|---|
JitCode#run(a1, a2, a3) |
Receives the return value as an Integer. |
JitCode#frun(a1, a2, a3) |
Receive the return value as Double. |
JitCode#dump() |
Output the generated assemble list. |
Now let's write a recursive version of the code that calculates the customary Fibonacci sequence. It is the same as the one originally presented as a sample.
var c = new Jit.Compiler();
var entry1 = c.enter();
var jump0 = c.ge(Jit.S0, Jit.IMM(3));
c.ret(Jit.S0);
var l1 = c.label();
c.sub(Jit.R0, Jit.S0, Jit.IMM(2));
c.call(entry1);
c.mov(Jit.S1, Jit.R0);
c.sub(Jit.R0, Jit.S0, Jit.IMM(1));
c.call(entry1);
c.add(Jit.R0, Jit.R0, Jit.S1);
c.ret(Jit.R0);
jump0.setLabel(l1);
var code = c.generate();
for (var i = 1; i <= 42; ++i) {
var tmr = new SystemTimer();
var r = code.run(i);
System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}
The result is as follows.
[ 0.000] fib( 1) = 1
[ 0.000] fib( 2) = 2
[ 0.000] fib( 3) = 3
[ 0.000] fib( 4) = 5
[ 0.000] fib( 5) = 8
[ 0.000] fib( 6) = 13
[ 0.000] fib( 7) = 21
[ 0.000] fib( 8) = 34
[ 0.000] fib( 9) = 55
[ 0.000] fib(10) = 89
[ 0.000] fib(11) = 144
[ 0.000] fib(12) = 233
[ 0.000] fib(13) = 377
[ 0.000] fib(14) = 610
[ 0.000] fib(15) = 987
[ 0.000] fib(16) = 1597
[ 0.000] fib(17) = 2584
[ 0.000] fib(18) = 4181
[ 0.000] fib(19) = 6765
[ 0.000] fib(20) = 10946
[ 0.000] fib(21) = 17711
[ 0.000] fib(22) = 28657
[ 0.000] fib(23) = 46368
[ 0.000] fib(24) = 75025
[ 0.000] fib(25) = 121393
[ 0.001] fib(26) = 196418
[ 0.001] fib(27) = 317811
[ 0.001] fib(28) = 514229
[ 0.002] fib(29) = 832040
[ 0.002] fib(30) = 1346269
[ 0.004] fib(31) = 2178309
[ 0.006] fib(32) = 3524578
[ 0.009] fib(33) = 5702887
[ 0.016] fib(34) = 9227465
[ 0.035] fib(35) = 14930352
[ 0.042] fib(36) = 24157817
[ 0.066] fib(37) = 39088169
[ 0.119] fib(38) = 63245986
[ 0.181] fib(39) = 102334155
[ 0.289] fib(40) = 165580141
[ 0.476] fib(41) = 267914296
[ 0.773] fib(42) = 433494437
By the way, I measured the result of fib (42)
with Ruby, Python, PyPy, PHP, HHVM, Kinx, Kinx (native) and compared them. Since the JIT library version only measures the time of run ()
above, everything including script interpretation and JIT code generation is calculated fairly by the user time of the entire process.
It is as follows when arranged in order of speed. After all, it is remarkably fast when the native code is output directly by JIT. It's a nice miscalculation that Kinx (native) was faster than PyPy. How much is HHVM? Ruby is faster in scripts. I'm deeply moved when I know the 1.8 era.
language | Version number | User time |
---|---|---|
Kinx(Jit-Lib) | 0.10.0 | 0.828 |
HHVM | 3.21.0 | 2.227 |
Kinx(native) | 0.10.0 | 2.250 |
PyPy | 5.10.0 | 3.313 |
PHP | 7.2.24 | 11.422 |
Ruby | 2.5.1p57 | 14.877 |
Kinx | 0.10.0 | 27.478 |
Python | 2.7.15+ | 41.125 |
Click here for the assemble list generated by the JIT library. It's different between Windows and Linux, but this time it's Linux.
0: 53 push rbx
1: 41 57 push r15
3: 41 56 push r14
5: 48 8b df mov rbx, rdi
8: 4c 8b fe mov r15, rsi
b: 4c 8b f2 mov r14, rdx
e: 48 83 ec 10 sub rsp, 0x10
12: 48 83 fb 03 cmp rbx, 0x3
16: 73 0d jae 0x25
18: 48 89 d8 mov rax, rbx
1b: 48 83 c4 10 add rsp, 0x10
1f: 41 5e pop r14
21: 41 5f pop r15
23: 5b pop rbx
24: c3 ret
25: 48 8d 43 fe lea rax, [rbx-0x2]
29: 48 89 fa mov rdx, rdi
2c: 48 89 c7 mov rdi, rax
2f: e8 cc ff ff ff call 0x0
34: 49 89 c7 mov r15, rax
37: 48 8d 43 ff lea rax, [rbx-0x1]
3b: 48 89 fa mov rdx, rdi
3e: 48 89 c7 mov rdi, rax
41: e8 ba ff ff ff call 0x0
46: 49 03 c7 add rax, r15
49: 48 83 c4 10 add rsp, 0x10
4d: 41 5e pop r14
4f: 41 5f pop r15
51: 5b pop rbx
52: c3 ret
As an example of Const, if you dare to write it, it looks like this. I'm making a jump table for local variables, so I'm not good at recreating the table every time. It seems that it will be solved if you prepare a separate interface that allows you to create only a table and pass the address (maybe).
var c = new Jit.Compiler();
c.enter();
c.mov(Jit.R1, Jit.IMM(-1));
var jump0 = c.slt(Jit.S0, Jit.IMM(0));
var jump1 = c.sgt(Jit.S0, Jit.IMM(3));
var const0 = c.makeConst(Jit.VAR(0));
var const1 = c.makeConst(Jit.VAR(1));
var const2 = c.makeConst(Jit.VAR(2));
var const3 = c.makeConst(Jit.VAR(3));
//The address of the local variable is acquired by the offset of the S0 register (first argument) and stored in the R0 register.
c.localp(Jit.R0, Jit.S0);
//Get the value of a local variable itself.
c.mov(Jit.R0, Jit.MEM1(Jit.R0));
//Jump by regarding the contents of local variables as addresses.
c.ijmp(Jit.R0);
var l0 = c.label();
c.mov(Jit.R1, Jit.IMM(102));
c.ret(Jit.R1);
var l1 = c.label();
c.mov(Jit.R1, Jit.IMM(103));
c.ret(Jit.R1);
var l2 = c.label();
c.mov(Jit.R1, Jit.IMM(104));
c.ret(Jit.R1);
var l3 = c.label();
c.mov(Jit.R1, Jit.IMM(105));
var l4 = c.label();
c.ret(Jit.R1);
//The jump address is set before code generation.
jump0.setLabel(l4);
jump1.setLabel(l4);
var code = c.generate();
//The const value is set after code generation.
const0.setLabel(l0);
const1.setLabel(l1);
const2.setLabel(l2);
const3.setLabel(l3);
for (var i = -1; i < 5; ++i) {
var r = code.run(i);
System.println(r);
}
result.
-1
102
103
104
105
-1
The code output looks like this. I tried this on the Windows version.
0: 53 push rbx
1: 56 push rsi
2: 57 push rdi
3: 48 8b d9 mov rbx, rcx
6: 48 8b f2 mov rsi, rdx
9: 49 8b f8 mov rdi, r8
c: 4c 8b 4c 24 b0 mov r9, [rsp-0x50]
11: 48 83 ec 50 sub rsp, 0x50
15: 48 c7 c2 ff ff ff ff mov rdx, 0xffffffffffffffff
1c: 48 83 fb 00 cmp rbx, 0x0
20: 0f 8c 94 00 00 00 jl 0xba
26: 48 83 fb 03 cmp rbx, 0x3
2a: 0f 8f 8a 00 00 00 jg 0xba
30: 49 b9 95 ff 57 61 89 01 00 00 mov r9, 0x1896157ff95
3a: 4c 89 4c 24 20 mov [rsp+0x20], r9
3f: 49 b9 a7 ff 57 61 89 01 00 00 mov r9, 0x1896157ffa7
49: 4c 89 4c 24 28 mov [rsp+0x28], r9
4e: 49 b9 b9 ff 57 61 89 01 00 00 mov r9, 0x1896157ffb9
58: 4c 89 4c 24 30 mov [rsp+0x30], r9
5d: 49 b9 cb ff 57 61 89 01 00 00 mov r9, 0x1896157ffcb
67: 4c 89 4c 24 38 mov [rsp+0x38], r9
6c: 48 8d 44 24 20 lea rax, [rsp+0x20]
71: 48 6b db 08 imul rbx, rbx, 0x8
75: 48 03 c3 add rax, rbx
78: 48 8b 00 mov rax, [rax]
7b: ff e0 jmp rax
7d: 48 c7 c2 66 00 00 00 mov rdx, 0x66
84: 48 89 d0 mov rax, rdx
87: 48 83 c4 50 add rsp, 0x50
8b: 5f pop rdi
8c: 5e pop rsi
8d: 5b pop rbx
8e: c3 ret
8f: 48 c7 c2 67 00 00 00 mov rdx, 0x67
96: 48 89 d0 mov rax, rdx
99: 48 83 c4 50 add rsp, 0x50
9d: 5f pop rdi
9e: 5e pop rsi
9f: 5b pop rbx
a0: c3 ret
a1: 48 c7 c2 68 00 00 00 mov rdx, 0x68
a8: 48 89 d0 mov rax, rdx
ab: 48 83 c4 50 add rsp, 0x50
af: 5f pop rdi
b0: 5e pop rsi
b1: 5b pop rbx
b2: c3 ret
b3: 48 c7 c2 69 00 00 00 mov rdx, 0x69
ba: 48 89 d0 mov rax, rdx
bd: 48 83 c4 50 add rsp, 0x50
c1: 5f pop rdi
c2: 5e pop rsi
c3: 5b pop rbx
c4: c3 ret
The point is jmp rax
on line 7b. If the table can be defined statically, it will function as a jump table (there is no easy way to do it now ...).
It's a little annoying, but if you want to pass 4 or more arguments, store the value in the local variable area and pass the address (pointer) as an argument. In the following example, the argument is first passed through the hook function for setting the argument in the local variable area. By the way, since all local variables are allocated in 8 bytes, note that the offset when accessing directly with Jit.MEM1 ()
etc. must be a multiple of 8.
var c = new Jit.Compiler();
var entry1 = c.enter();
c.mov(Jit.VAR(0), Jit.S0);
c.mov(Jit.VAR(1), Jit.IMM(3));
c.mov(Jit.VAR(2), Jit.IMM(2));
c.mov(Jit.VAR(3), Jit.IMM(1));
c.localp(Jit.R0);
var call1 = c.call();
c.ret(Jit.R0);
var entry2 = c.enter();
c.mov(Jit.R1, Jit.S0);
c.mov(Jit.S0, Jit.MEM1(Jit.R1, 0));
var jump0 = c.ge(Jit.S0, Jit.MEM1(Jit.R1, 8));
c.ret(Jit.S0);
var l1 = c.label();
c.sub(Jit.R3, Jit.S0, Jit.MEM1(Jit.R1, 16));
c.mov(Jit.VAR(0), Jit.R3);
c.mov(Jit.VAR(1), Jit.IMM(3));
c.mov(Jit.VAR(2), Jit.IMM(2));
c.mov(Jit.VAR(3), Jit.IMM(1));
c.localp(Jit.R0);
c.call(entry2);
c.mov(Jit.S1, Jit.R0);
c.sub(Jit.R3, Jit.S0, Jit.MEM1(Jit.R1, 24));
c.mov(Jit.VAR(0), Jit.R3);
c.mov(Jit.VAR(1), Jit.IMM(3));
c.mov(Jit.VAR(2), Jit.IMM(2));
c.mov(Jit.VAR(3), Jit.IMM(1));
c.localp(Jit.R0);
c.call(entry2);
c.add(Jit.R0, Jit.R0, Jit.S1);
c.ret(Jit.R0);
jump0.setLabel(l1);
call1.setLabel(entry2);
var code = c.generate();
for (var i = 1; i <= 42; ++i) {
var tmr = new SystemTimer();
var r = code.run(i);
System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}
The output is the same as before.
I haven't introduced Double, so that too. Let's go with Fibonacci as well. But I love Fibonacci. I didn't notice it. It is a 0.1 step version.
var c = new Jit.Compiler();
var entry1 = c.enter(Jit.ArgType.FP);
c.mov(Jit.FR0, Jit.IMM(0.3));
var jump0 = c.ge(Jit.FS0, Jit.FR0);
c.ret(Jit.FS0);
var l1 = c.label();
c.mov(Jit.FR0, Jit.IMM(0.2));
c.sub(Jit.FR0, Jit.FS0, Jit.FR0);
c.call(entry1);
c.mov(Jit.FS1, Jit.FR0);
c.mov(Jit.FR0, Jit.IMM(0.1));
c.sub(Jit.FR0, Jit.FS0, Jit.FR0);
c.call(entry1);
c.add(Jit.FR0, Jit.FR0, Jit.FS1);
c.ret(Jit.FR0);
jump0.setLabel(l1);
var code = c.generate();
for (var i = 0.1; i < 3.5; i += 0.1) {
var tmr = new SystemTimer();
var r = code.frun(i);
System.println("[%8.3f] fib(%3.1f) = %.1f" % tmr.elapsed() % i % r);
}
Since the immediate value of the floating point number is not made available in the direct comparison method (it should be done), it needs to be temporarily stored in the register and used.
You can receive a Double value by doing frun ()
. The result is as follows.
[ 0.000] fib(0.1) = 0.1
[ 0.000] fib(0.2) = 0.2
[ 0.000] fib(0.3) = 0.3
[ 0.000] fib(0.4) = 0.5
[ 0.000] fib(0.5) = 0.8
[ 0.000] fib(0.6) = 1.3
[ 0.000] fib(0.7) = 2.1
[ 0.000] fib(0.8) = 3.4
[ 0.000] fib(0.9) = 5.5
[ 0.000] fib(1.0) = 8.9
[ 0.000] fib(1.1) = 14.4
[ 0.000] fib(1.2) = 23.3
[ 0.000] fib(1.3) = 37.7
[ 0.000] fib(1.4) = 61.0
[ 0.000] fib(1.5) = 98.7
[ 0.000] fib(1.6) = 159.7
[ 0.000] fib(1.7) = 258.4
[ 0.000] fib(1.8) = 418.1
[ 0.000] fib(1.9) = 676.5
[ 0.000] fib(2.0) = 1094.6
[ 0.000] fib(2.1) = 1771.1
[ 0.000] fib(2.2) = 2865.7
[ 0.000] fib(2.3) = 4636.8
[ 0.000] fib(2.4) = 7502.5
[ 0.000] fib(2.5) = 12139.3
[ 0.001] fib(2.6) = 19641.8
[ 0.001] fib(2.7) = 31781.1
[ 0.002] fib(2.8) = 51422.9
[ 0.003] fib(2.9) = 83204.0
[ 0.004] fib(3.0) = 134626.9
[ 0.006] fib(3.1) = 217830.9
[ 0.015] fib(3.2) = 352457.8
[ 0.020] fib(3.3) = 570288.7
[ 0.027] fib(3.4) = 922746.5
The output code is as follows. This is also the Windows version. To pass a floating point number, there is a simple hook function first. SLJIT does not allow you to specify a floating point number as an argument at the entry point of the function, so this is avoided.
In that sense as well, using this one is better than using SLJIT directly. Because the required size is automatically calculated in the local variable area, and the necessary number of temporary storage codes for non-destructive registers are also calculated automatically.
0: 53 push rbx
1: 56 push rsi
2: 57 push rdi
3: 48 8b d9 mov rbx, rcx
6: 48 8b f2 mov rsi, rdx
9: 49 8b f8 mov rdi, r8
c: 4c 8b 4c 24 d0 mov r9, [rsp-0x30]
11: 48 83 ec 30 sub rsp, 0x30
15: 0f 29 74 24 20 movaps [rsp+0x20], xmm6
1a: f2 0f 10 03 movsd xmm0, qword [rbx]
1e: 48 89 f2 mov rdx, rsi
21: 49 89 f8 mov r8, rdi
24: 48 89 c1 mov rcx, rax
27: e8 0d 00 00 00 call 0x39
2c: 0f 28 74 24 20 movaps xmm6, [rsp+0x20]
31: 48 83 c4 30 add rsp, 0x30
35: 5f pop rdi
36: 5e pop rsi
37: 5b pop rbx
38: c3 ret
39: 53 push rbx
3a: 56 push rsi
3b: 57 push rdi
3c: 48 8b d9 mov rbx, rcx
3f: 48 8b f2 mov rsi, rdx
42: 49 8b f8 mov rdi, r8
45: 4c 8b 4c 24 b0 mov r9, [rsp-0x50]
4a: 48 83 ec 50 sub rsp, 0x50
4e: 0f 29 74 24 20 movaps [rsp+0x20], xmm6
53: f2 0f 11 6c 24 38 movsd [rsp+0x38], xmm5
59: f2 0f 10 f0 movsd xmm6, xmm0
5d: 49 b9 33 33 33 33 33 33 d3 3f mov r9, 0x3fd3333333333333
67: 4c 89 4c 24 40 mov [rsp+0x40], r9
6c: f2 0f 10 44 24 40 movsd xmm0, qword [rsp+0x40]
72: 66 0f 2e f0 ucomisd xmm6, xmm0
76: 73 17 jae 0x8f
78: f2 0f 10 c6 movsd xmm0, xmm6
7c: f2 0f 10 6c 24 38 movsd xmm5, qword [rsp+0x38]
82: 0f 28 74 24 20 movaps xmm6, [rsp+0x20]
87: 48 83 c4 50 add rsp, 0x50
8b: 5f pop rdi
8c: 5e pop rsi
8d: 5b pop rbx
8e: c3 ret
8f: 49 b9 9a 99 99 99 99 99 c9 3f mov r9, 0x3fc999999999999a
99: 4c 89 4c 24 40 mov [rsp+0x40], r9
9e: f2 0f 10 44 24 40 movsd xmm0, qword [rsp+0x40]
a4: f2 0f 10 e6 movsd xmm4, xmm6
a8: f2 0f 5c e0 subsd xmm4, xmm0
ac: f2 0f 11 e0 movsd xmm0, xmm4
b0: 48 89 c1 mov rcx, rax
b3: e8 81 ff ff ff call 0x39
b8: f2 0f 10 e8 movsd xmm5, xmm0
bc: 49 b9 9a 99 99 99 99 99 b9 3f mov r9, 0x3fb999999999999a
c6: 4c 89 4c 24 40 mov [rsp+0x40], r9
cb: f2 0f 10 44 24 40 movsd xmm0, qword [rsp+0x40]
d1: f2 0f 10 e6 movsd xmm4, xmm6
d5: f2 0f 5c e0 subsd xmm4, xmm0
d9: f2 0f 11 e0 movsd xmm0, xmm4
dd: 48 89 c1 mov rcx, rax
e0: e8 54 ff ff ff call 0x39
e5: f2 0f 58 c5 addsd xmm0, xmm5
e9: f2 0f 10 6c 24 38 movsd xmm5, qword [rsp+0x38]
ef: 0f 28 74 24 20 movaps xmm6, [rsp+0x20]
f4: 48 83 c4 50 add rsp, 0x50
f8: 5f pop rdi
f9: 5e pop rsi
fa: 5b pop rbx
fb: c3 ret
JIT is interesting. If you implement and combine it with a parser combinator, you can create a little language processing system with JIT. Maybe you can aim for such a path.
Perhaps there are two possible uses:
see you.