I investigated the mechanism of Emscripten by executing the backtrack type Sudoku solver implemented in C in the JavaScript environment using Emscripten. , Is the story.
emscripten uses LLVM / Clang to load C and C ++ programs into browsers and nodejs/[iojs](https:: It is a system that enables execution in a JavaScript environment such as //iojs.org/).
Based on the LLVM bytecode that compiles the C / C ++ source, ** conversion program for executing on JavaScript and runtime environment part ** for executing it, and compatibility with gcc, make, etc. Enhanced ʻemcc](https://github.com/kripken/emscripten/blob/master/emcc) and [ʻemmake
) And other ** command lines ** are provided. The former is run by nodejs and the latter by python.
The ** runtime library (function) used in C / C ++ such as libc is designed to be emulated on the JavaScript side **. .. Standard output to the console, drawing of SDL/OpenGL etc. on nodejs / iojs environment or browser The JS implementation to be realized, etc. are attached as standard to emscripten (for example, the emscripten implementation of libc is [/usr/local/opt/emscripten/libexec/src/library.js
](https: //) github.com/kripken/emscripten/blob/master/src/library.js)).
In the OSX environment, I installed emscripten with homebrew (brew install emscripten
).
If you include emscripten, commands such as ʻemcc(gcc / clang compatible command) will be added, and the first time you execute that command, a
~ / .emscripten` file with the environment set will be created.
However, with OSX, the contents are not reflected properly, such as using the clang
command attached to the current system. Therefore, edit it again as follows.
# ~/.emscripten
EMSCRIPTEN_ROOT = os.path.expanduser(
os.getenv('EMSCRIPTEN') or
'/usr/local/opt/emscripten/libexec')
LLVM_ROOT = os.path.expanduser(
os.getenv('LLVM') or
'/usr/local/opt/emscripten/libexec/llvm/bin')
NODE_JS = os.path.expanduser(
os.getenv('NODE') or
'/usr/local/opt/node/bin/node')
TMP_DIR = '/tmp'
COMPILER_ENGINE = NODE_JS
JS_ENGINES = [NODE_JS]
By specifying the homebrew package in this way, even if the emscripten or nodejs package installed by homebrew upgrade is upgraded, it can be executed without resetting every time. ..
If you run the ʻemcc` command empty with this setting, you will get the following result if it is set correctly.
$ emcc
WARNING root: (Emscripten: settings file has changed, clearing cache)
INFO root: (Emscripten: Running sanity checks)
WARNING root: no input files
First, describe a backtrack type 9x9 square Sudoku solver in C11 specification. Separate the files into the library part and the main
part of the code so that they can be used only by the library later.
// libsudoku.c
#include <stdio.h>
// helpers for board data
static inline unsigned masks(unsigned i) {return i ? 1 << (i - 1) : 0;}
static inline unsigned row(unsigned i) {return i / 9;}
static inline unsigned col(unsigned i) {return i % 9;}
static inline unsigned blk(unsigned i) {return i / 27 * 3 + i % 9 / 3;}
extern void output(unsigned board[])
{
char buffer[(9 + 3) * (9 + 3)];
char* cur = buffer;
for (unsigned y = 0; y < 9; y++) {
for (unsigned x = 0; x < 9; x++) {
*cur++ = board[y * 9 + x] > 0 ? board[y * 9 + x] + '0' : '.';
if (x % 3 == 2) *cur++ = x == 8 ? '\n' : '|';
}
if (y == 8) {
*cur++ = '\0';
} else if (y % 3 == 2) {
for (unsigned i = 0; i < 11; i++) *cur++ = i % 4 == 3 ? '+' : '-';
*cur++ = '\n';
}
}
puts(buffer);
}
// sudoku solver
typedef void (*sudoku_cb)(unsigned board[]);
typedef struct {
unsigned board[81];
unsigned rows[9], cols[9], blks[9];
sudoku_cb callback;
} sudoku_t;
static void sudoku_init(sudoku_t* s, unsigned board[])
{
for (unsigned i = 0; i < 81; i++) {
const unsigned mask = masks(board[i]);
s->rows[row(i)] |= mask, s->cols[col(i)] |= mask, s->blks[blk(i)] |= mask;
s->board[i] = board[i];
}
}
static void sudoku_solve(sudoku_t* s, unsigned i)
{
if (i == 81) {
s->callback(s->board);
} else if (s->board[i] != 0) {
sudoku_solve(s, i + 1);
} else {
const unsigned r = row(i), c = col(i), b = blk(i);
const unsigned used = s->rows[r] | s->cols[c] | s->blks[b];
for (unsigned v = 1; v <= 9; v++) {
const unsigned mask = masks(v);
if (used & mask) continue;
s->board[i] = v;
s->rows[r] |= mask, s->cols[c] |= mask, s->blks[b] |= mask;
sudoku_solve(s, i + 1);
s->rows[r] &= ~mask, s->cols[c] &= ~mask, s->blks[b] &= ~mask;
s->board[i] = 0;
}
}
}
extern void sudoku(unsigned board[], sudoku_cb callback)
{
sudoku_t s = {
.board = {0}, .rows = {0}, .cols = {0}, .blks = {0}, .callback = callback};
sudoku_init(&s, board);
sudoku_solve(&s, 0);
}
// sudoku-main.c
#include <stdio.h>
extern void output(unsigned board[]);
typedef void (*sudoku_cb)(unsigned board[]);
extern void sudoku(unsigned board[], sudoku_cb callback);
// example problem from http://rosettacode.org/wiki/Sudoku
static unsigned problem[] = {
8, 5, 0, 0, 0, 2, 4, 0, 0,
7, 2, 0, 0, 0, 0, 0, 0, 9,
0, 0, 4, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 7, 0, 0, 2,
3, 0, 5, 0, 0, 0, 9, 0, 0,
0, 4, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 8, 0, 0, 7, 0,
0, 1, 7, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 3, 6, 0, 4, 0
};
extern int main(int argc, char* argv[])
{
if (argc <= 1) {
puts("[problem]");
output(problem);
puts("[solutions]");
sudoku(problem, output);
} else {
for (int i = 1; i < argc; i++) {
char ch = -1;
for (int c = 0; c < 81; c++) {
if (ch == '\0') {
problem[c] = 0;
} else {
ch = argv[i][c];
problem[c] = '0' <= ch && ch <= '9' ? ch - '0' : 0;
}
}
puts("[problem]");
output(problem);
puts("[solution]");
sudoku(problem, output);
}
}
return 0;
}
This code is a program that can be normally compiled / executed with gcc / clang as shown below.
$ clang -std=c11 -Wall -Wextra libsudoku.c sudoku-main.c -o sudoku
$ ./sudoku
[problem]
85.|..2|4..
72.|...|..9
..4|...|...
---+---+---
...|1.7|..2
3.5|...|9..
.4.|...|...
---+---+---
...|.8.|.7.
.17|...|...
...|.36|.4.
[solutions]
859|612|437
723|854|169
164|379|528
---+---+---
986|147|352
375|268|914
241|593|786
---+---+---
432|981|675
617|425|893
598|736|241
You can compile it into JavaScript code just by changing the clang
part in the above build to ʻemcc and ** adding
.js` to the output file name **.
The generated js file is ** stand-alone and executable on nodejs / iojs **.
$ emcc -std=c11 -Wall -Wextra libsudoku.c sudoku-main.c -o sudoku.js
$ iojs sudoku.js
[problem]
85.|..2|4..
72.|...|..9
..4|...|...
---+---+---
...|1.7|..2
3.5|...|9..
.4.|...|...
---+---+---
...|.8.|.7.
.17|...|...
...|.36|.4.
[solutions]
859|612|437
723|854|169
164|379|528
---+---+---
986|147|352
375|268|914
241|593|786
---+---+---
432|981|675
617|425|893
598|736|241
The code generated by Emscripten is not ** JS code that can be used as a normal JavaScript function (like a translator such as CoffeeScript) **. What is generated is a combination of the LLVM bytecode (assuming the C language execution model) and its execution environment. For example, there is a fixed memory area management mechanism in the JS code, such as HEAP.
This is the same even if the library is generated, and the ʻextern` function that the library has is the one that executes the bytecode as a ** C function on the execution environment embedded in the library. It is **.
Therefore, in order to use the code generated by Emscripten as a library for JavaScript, some ingenuity is required in some places. The official documentation, Interacting with code, contains the information needed for this. However, if you don't recognize the major difference that these are ** means of access to the C program execution environment in the library **, you will be addicted to it.
To generate libsudoku.js
to be used as a library on nodejs / iojs from libsudoku.c
, you need to specify the attached options -s EXPORTED_FUNCTIONS
and -s RESERVED_FUNCTION_POINTERS
as follows. there is.
$ emcc -Wall -Wextra -std=c11 libsudoku.c -o libsudoku.js \
-s EXPORTED_FUNCTIONS="['_sudoku','_output']" \
-s RESERVED_FUNCTION_POINTERS=20
In the former ʻEXPORTED_FUNCTIONS, specify the function name ʻextern
in C to call it on the JavaScript side. However, instead of listing the C function names as they are, list the names ** prefixed with _
**.
The latter RESERVED_FUNCTION_POINTERS
is a variable needed for ** when a function that passes a pointer to a callback function, such as thesudoku (board, callback)
function, needs to be called from the JavaScript side **. Here, reserve the number of function pointers that can be set on the bytecode execution environment in order to call back the JavaScript function
. Since libsudoku.js
is not executed in parallel, 1
is sufficient for libsudoku.js, but 20
is specified here.
The docs say you can use ccall
and cwrap
, but they only list when the simple number
value is exchanged, and were generated to use an array or function as an argument. I analyzed the code and made trial and error.
ccall
First, the wrapper that calls void output (unsigned board [])
of libsudoku.c
is as follows.
var libsudoku = require("./libsudoku.js");
// use ccall to call emscripten C function
var output = function (board) {
// unsigned[] as Uint8Array view of Uint32Array
var unsignedBoard = new Uint8Array(new Uint32Array(board).buffer);
// ccall(cfuncname, return_type, argument_types, arguments)
// type: "string", "number", "array" or undefined as return type
return libsudoku.ccall("output", undefined, ["array"], [unsignedBoard]);
};
Here, we call ʻArrayin JavaScript as the array value in Emscripten. The argument is
library.ccall (function name, return type, argument type list, argument value list). This function name is ** a function name in C without a leading
_` **, not a compile option.
If the return value is void
, specify ʻundefined. The argument type can be either
" number ",
" string ", or
" array "`. This type specification is not the type in the C function declaration, but the ** JavaScript value type ** that is actually passed when it is called and executed.
The pointer type of the argument of ʻoutput () is ʻunsigned (int)
, but ** Emscripten seems to treat the ʻinttype as a 32-bit integer **. And ** byte order seems to inherit the CPU environment dependency of
TypedArray as it is ** (CPU-dependent
new Uint32Array (buffer) `etc. is described in the generated code).
If you are dealing with a 32-bit array in C language, the ** " array "
type argument when calling with ccall
must be 8-bit ʻInt8Array or ʻUint8Array
**. Looking at the ccall
implementation in the library, the argument passed in the call ** copies its value to a memory area in the library and executes bytecode **. The " string "
and " array "
specify the copy method. In the case of " array "
, it is copied with HEAP8 [...] = array [i]
, so if you pass ʻUint32Array etc. to
ccallas it is, it will be truncated to 8 bits. By the way,
" string "passes to C as a UTF8
char` string.
The following code implements a wrapper to sudoku (board, callback)
using another cwrap
. ccall (function name, return type, argument type list, argument value list)
is equivalent to cwrap (function name, return type, argument type list) (argument value list)
. If you call the emscripten function many times, it is better to cache the function returned by cwrap
and call it by passing only the arguments.
The point here is that the JavaScript callback function is set in the bytecode execution environment, and its pointer value is passed as the argument of the function created by cwrap
as"number"
.
// use cwrap to call emscripten C function with callback
var sudoku = function sudoku(board, callback) {
// NOTE: For using addFunction(),
// emcc option "-s REQUIRED_FUNCTION_POINTERS=10" (more than 1) required
var callbackPtr = libsudoku.Runtime.addFunction(function (resultPtr) {
var r = new Uint32Array(libsudoku.HEAPU8.buffer, resultPtr, 81);
// NOTE: copy memory value as array for async use in callback(like IO)
callback([].slice.call(r));
});
var unsignedBoard = new Uint8Array(new Uint32Array(board).buffer);
sudoku._cfunc(unsignedBoard, callbackPtr);
libsudoku.Runtime.removeFunction(callbackPtr);
};
// pointer as a "number"
sudoku._cfunc = libsudoku.cwrap("sudoku", undefined, ["array", "number"]);
Use ptr = library.Runtime.addFunction (function)
to set JavaScript function
in the bytecode environment and receive its pointer value.
The number of compile-time arguments RESERVED_FUNCTION_POINTERS
refers to the number of JavaScript functions that can be registered with ʻaddFunction` at the same time.
The arguments passed to the function called in the callback are the same numbers in the execution environment (not the values in ccall
etc.). ** If it is a pointer, it will be used as the offset value of HEAP (library.HEAPU8
etc.) at runtime **. To hide this, ** callback functions also need a wrapper **. In this callback wrapper we make a regular ʻArray` from the memory offset to call the actual callback.
var callbackPtr = libsudoku.Runtime.addFunction(function (resultPtr) {
var r = new Uint32Array(libsudoku.HEAPU8.buffer, resultPtr, 81);
callback([].slice.call(r));
});
Since the result
passed in the callback is ʻunsigned [81] on C, cut out an array of 81 32-bit integers from HEAP with ʻUint32Array
, convert it to a normal ʻArray, and use the callback function. I am trying to call.
ccall and
cwrap` perform such HEAP access inside to convert between JavaScript objects and memory values.
Not limited to the function pointer returned by library.Runtime.addFunction (f)
, the pointer value is passed as a " number "
to the argument of the function of ccall
or cwrap
.
After the function call is finished, delete it with library.Runtime.removeFunction (ptr)
and open a place where ʻaddFunction` can be done. In Emscripten libraries, you need to do this ** resource management manually ** as you do in C.
Below is the code that performs the same processing as sudoku-main.c
using the above wrapper.
// sudoku-call-libsudoku.js
var libsudoku = require("./libsudoku.js");
var output = function (board) {
var unsignedBoard = new Uint8Array(new Uint32Array(board).buffer);
return libsudoku.ccall("output", undefined, ["array"], [unsignedBoard]);
};
var sudoku = function sudoku(board, callback) {
var callbackPtr = libsudoku.Runtime.addFunction(function (resultPtr) {
var r = new Uint32Array(libsudoku.HEAPU8.buffer, resultPtr, 81);
callback([].slice.call(r));
});
var unsignedBoard = new Uint8Array(new Uint32Array(board).buffer);
sudoku._cfunc(unsignedBoard, callbackPtr);
libsudoku.Runtime.removeFunction(callbackPtr);
};
sudoku._cfunc = libsudoku.cwrap("sudoku", undefined, ["array", "number"]);
// main
if (process.argv.length <= 2) {
// example problem from http://rosettacode.org/wiki/Sudoku
var problem = [
8, 5, 0, 0, 0, 2, 4, 0, 0,
7, 2, 0, 0, 0, 0, 0, 0, 9,
0, 0, 4, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 7, 0, 0, 2,
3, 0, 5, 0, 0, 0, 9, 0, 0,
0, 4, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 8, 0, 0, 7, 0,
0, 1, 7, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 3, 6, 0, 4, 0
];
console.log("[problem]");
output(problem);
console.log("[solution]");
sudoku(problem, output);
} else {
for (var i = 2; i < process.argv.length; i++) {
var problem = Array(81);
for (var j = 0; j < 81; j++) {
var ch = process.argv[i].charCodeAt(j);
problem[j] = 0 <= ch && ch <= 9 ? v : 0;
}
console.log("[problem]");
output(problem);
console.log("[solution]");
sudoku(problem, output);
}
}
As shown below, it can be executed as it is without ** external library required **, just like when compiling including main
.
$ iojs sudoku-call-libsudoku.js
[problem]
85.|..2|4..
72.|...|..9
..4|...|...
---+---+---
...|1.7|..2
3.5|...|9..
.4.|...|...
---+---+---
...|.8.|.7.
.17|...|...
...|.36|.4.
[solution]
859|612|437
723|854|169
164|379|528
---+---+---
986|147|352
375|268|914
241|593|786
---+---+---
432|981|675
617|425|893
598|736|241
The execution time of sudoku
of native execution, sudoku.js
which converted all C code, and sudoku-call-libsudoku.js
which uses the library libsudoku.js
is as follows.
$ time ./sudoku
real 0m0.020s
user 0m0.015s
sys 0m0.002s
$ time iojs sudoku-call-libsudoku.js
real 0m0.355s
user 0m0.313s
sys 0m0.037s
$ time iojs sudoku.js
real 0m0.888s
user 0m0.312s
sys 0m0.045s
It seems that it is faster to use only the library than to implement main in C (although the external library function avoids printf
etc. and uses onlyputs ()
).
By the way, the execution time in Code implemented backtrack with JavaScript is
$ time iojs sudoku-backtrack.js
real 0m0.723s
user 0m0.684s
sys 0m0.030s
have become.
Solving the board only once is faster than converting the entire C code, probably because the cost of the main part has a large effect. It took twice as long to use the library, and it can be said that the solver part was doubled with Emscripten.
struct
by value in ccall
With ccall
and cwrap
, it looks like there is no way to pass a ** (not pointer-passed) struct
value **, such as vec4
with four float
s.
Similar to gcc / clang, optimization and debugging options include O0-3 and g1-4, but they have different meanings and are Emscripten-specific.
--O0: No optimization (default)
--O1: Debug code is deleted
--O2: Output is minified (.js.mem
file is generated)
--O3: Further optimization (file is slightly smaller than O2)
The .js.mem
file generated at the same time by O2 and O3 is an essential file to execute the generated .js
code.
The debug option is used in combination with the optimization option after O2 to suppress minification **.
--g1: Save whitespace --g2: Save function name --g3: Save variable name --g4: Generate source map (.map file)
The complete source code is in gist with Makefile
.
Recommended Posts