As the title says, I tried to implement Ruby itself in Ruby (and C) using Ruby's builtin (borrowed from the calling method because I do not know the official name).
The content is for those who are interested in implementing Ruby itself.
builtin is to implement Ruby itself in Ruby (and C) (Isn't there a formal name so far?). You can implement Ruby more easily using Ruby and C by calling __builtin_ <function name defined in C>
from Ruby code as shown below.
For example, Hash # delete
is implemented in C as follows:
static VALUE
rb_hash_delete_m(VALUE hash, VALUE key)
{
VALUE val;
rb_hash_modify_check(hash);
val = rb_hash_delete_entry(hash, key);
if (val != Qundef) {
return val;
}
else {
if (rb_block_given_p()) {
return rb_yield(key);
}
else {
return Qnil;
}
}
}
The first argument hash
receives the hash itself as an argument, and the second argument key
receives the key passed in Hash # delete
. By the way, values such as variables on the Ruby side are received as VALUE
type and processed by C functions.
rb_hash_modify_check(hash);
The rb_hash_modify_check
function internally executes the rb_check_frozen
function to see if the hash is frozen.
static void
rb_hash_modify_check(VALUE hash)
{
rb_check_frozen(hash); //Check if the object is frozen
}
In val = rb_hash_delete_entry (hash, key);
, the value to be deleted is acquired based on the key received in the argument, and the deletion is performed at the same time. If there is no value to pair with the key, the undefined value used in C called Qundef
will be entered.
if (val != Qundef) {
return val;
}
else {
if (rb_block_given_p()) {
return rb_yield(key);
}
else {
return Qnil;
}
}
The process is branched by the value of val
, and if it is not Qundef
(that is, if the value can be obtained by using the key and can be deleted), the deleted value is returned.
If it is Qundef
, it returns Qnil
( nil
in Ruby). If a block is passed, it executes rb_yield (key)
and returns the result.
As you can see, the Ruby you usually use is implemented using C.
By using the builtin function, the above code will be as follows.
class Hash
def delete(key)
value = __builtin_rb_hash_delete_m(key)
if value.nil?
if block_given?
yield key
else
nil
end
else
value
end
end
end
static VALUE
rb_hash_delete_m(rb_execution_context_t *ec, VALUE hash, VALUE key)
{
VALUE val;
rb_hash_modify_check(hash);
val = rb_hash_delete_entry(hash, key);
if (val != Qundef) {
return val;
}
else {
return Qnil;
}
}
Because the execution of blocks is processed on the Ruby side. I think the implementation in C is simpler and easier to read. Also
By using the builtin function like this, you can implement Ruby with Ruby and a little C code.
Also, it seems that there are cases where performance improves when implemented in Ruby rather than in C. For a more specific story, Mr. Sasada talked at RubyKaigi 2019, so please refer to that.
Write a Ruby interpreter in Ruby for Ruby 3
I found that I could implement a method using Ruby code by using builtin, so I actually tried it.
I started by creating a Ruby development environment. I used WSL + Ubuntu 18.04 as my environment and built a development environment. As a basic procedure, I proceeded by referring to (2) MRI source code structure of Ruby Hack Challenge.
First, install the libraries to be used.
sudo apt install git ruby autoconf bison gcc make zlib1g-dev libffi-dev libreadline-dev libgdbm-dev libssl-dev
Then create a working directory and change to it. ..
mkdir workdir
cd workdir
After moving to the working directory, clone the Ruby source code. It takes a lot of time, so it's a good idea to make coffee during this time.
git clone https://github.com/ruby/ruby.git
After cloning the source code, go to the ruby
directory and run ʻautoconf. This is to generate a
configurescript that will be executed later. After execution, it will return to
workdir`.
cd ruby
autoconf
cd ..
Then create a directory for your build and change to it.
mkdir build
cd build
Run ../ruby/configure --prefix = $ PWD / ../ install --enable-shared
to create a Makefile to build. Also, --prefix = $ PWD / ../ install
specifies where to install Ruby.
../ruby/configure --prefix=$PWD/../install --enable-shared
Then run make -j
to build. -j
is an option to compile in parallel. If you're not in a hurry, just make
is fine.
make -j
Finally, run make install
to create a ʻinstall directory inside the
workdir` directory and install Ruby.
make install
The latest Ruby is now installed in workdir / install
.
By the way, if you are wondering if it is really installed, try running ../install/bin/ruby -v
. If you see ruby 2.8.0 dev
and the version of Ruby, then Ruby is installed correctly.
Now that the development environment is in place, we will use builtin to redefine the methods. We will reimplement the Hash # delete
mentioned in the example earlier.
First, add various settings to common.mk
to use the Ruby source code when building.
There is a description of BUILTIN_RB_SRCS
around the 1000th line of common.mk
. Add a file that contains the Ruby code to be read by this BUILTIN_RB_SRCS
.
common.mk
BUILTIN_RB_SRCS = \
$(srcdir)/ast.rb \
$(srcdir)/gc.rb \
$(srcdir)/io.rb \
$(srcdir)/pack.rb \
$(srcdir)/trace_point.rb \
$(srcdir)/warning.rb \
$(srcdir)/array.rb \
$(srcdir)/prelude.rb \
$(srcdir)/gem_prelude.rb \
$(empty)
BUILTIN_RB_INCS = $(BUILTIN_RB_SRCS:.rb=.rbinc)
This time, add hash.rb
as follows to implement Hash.
BUILTIN_RB_SRCS = \
$(srcdir)/ast.rb \
$(srcdir)/gc.rb \
$(srcdir)/io.rb \
$(srcdir)/pack.rb \
$(srcdir)/trace_point.rb \
$(srcdir)/warning.rb \
$(srcdir)/array.rb \
$(srcdir)/prelude.rb \
$(srcdir)/gem_prelude.rb \
+ $(srcdir)/hash.rb \
$(empty)
BUILTIN_RB_INCS = $(BUILTIN_RB_SRCS:.rb=.rbinc)
Next, modify the part that specifies the file to be read in the Hash build around line 2520.
In this way, the file to be read such as hash.c
is specified.
common.mk
hash.$(OBJEXT): {$(VPATH)}hash.c
hash.$(OBJEXT): {$(VPATH)}id.h
hash.$(OBJEXT): {$(VPATH)}id_table.h
hash.$(OBJEXT): {$(VPATH)}intern.h
hash.$(OBJEXT): {$(VPATH)}internal.h
hash.$(OBJEXT): {$(VPATH)}missing.h
Add hash.rbinc
and builtin.h
here.
hash.$(OBJEXT): {$(VPATH)}hash.c
+hash.$(OBJEXT): {$(VPATH)}hash.rbinc
+hash.$(OBJEXT): {$(VPATH)}builtin.h
hash.$(OBJEXT): {$(VPATH)}id.h
hash.$(OBJEXT): {$(VPATH)}id_table.h
hash.$(OBJEXT): {$(VPATH)}intern.h
hash.$(OBJEXT): {$(VPATH)}internal.h
hash.$(OBJEXT): {$(VPATH)}missing.h
hash.rbinc
is a file that is automatically generated when make
is executed, and is generated based on the contents of __builtin_ <function name of C to be called>
checked in hash.rb
. Also, builtin.h
is a header file with implementations for using builtin.
This completes the modification in common.mk
.
Then modify ʻinits.c`. However, it is very easy to fix.
inits.c
#define BUILTIN(n) CALL(builtin_##n)
BUILTIN(gc);
BUILTIN(io);
BUILTIN(ast);
BUILTIN(trace_point);
BUILTIN(pack);
BUILTIN(warning);
BUILTIN(array);
Init_builtin_prelude();
}
ʻInits.cadds the Ruby source file that uses builtin as above. Add
BUILTIN (hash);` here in the same way.
#define BUILTIN(n) CALL(builtin_##n)
BUILTIN(gc);
BUILTIN(io);
BUILTIN(ast);
BUILTIN(trace_point);
BUILTIN(pack);
BUILTIN(warning);
BUILTIN(array);
+ BUILTIN(hash);
Init_builtin_prelude();
This is OK to modify ʻinits.c`.
Finally, we will modify the code in hash.c
.
First, add #include" builtin.h "
to the header reading part around the 40th line.
#include "ruby/st.h"
#include "ruby/util.h"
#include "ruby_assert.h"
#include "symbol.h"
#include "transient_heap.h"
+ #include "builtin.h"
Now you can use the structures etc. required for builtin in hash.c
.
Next, remove the part that defines Hash # delete
.
I think a function called ʻInit_Hash (void)is defined at the bottom of
hash.c`.
void
Init_Hash(void)
{
///The implementation code of Hash etc. is written.
}
The methods of each Ruby class are defined in this function as follows.
rb_define_method(rb_cHash, "delete", rb_hash_delete_m, 1);
Think of rb_define_method
as the same as a method definition in Ruby. Pass the VALUE
of the class that defines the method as the first argument, and the second argument is the method name.
The third argument is the function defined in C (the process executed by the method), and the fourth argument is the number of arguments received by the method.
If you want to define a Ruby method with builtin, you need to delete this definition part. This time we will reimplement Hash # delete
, so delete the part where delete
is defined.
rb_define_method(rb_cHash, "shift", rb_hash_shift, 0);
- rb_define_method(rb_cHash, "delete", rb_hash_delete_m, 1);
rb_define_method(rb_cHash, "delete_if", rb_hash_delete_if, 0);
Modify the rb_hash_delete_m
called by rb_define_method (rb_cHash, "delete", rb_hash_delete_m, 1);
that you deleted earlier so that it can be used in builtin.
There is an implementation of rb_hash_delete_m
around line 2380.
static VALUE
rb_hash_delete_m(VALUE hash, VALUE key)
{
VALUE val;
rb_hash_modify_check(hash);
val = rb_hash_delete_entry(hash, key);
if (val != Qundef) {
return val;
}
else {
if (rb_block_given_p()) {
return rb_yield(key);
}
else {
return Qnil;
}
}
}
Modify this as follows.
static VALUE
rb_hash_delete_m(rb_execution_context_t *ec, VALUE hash, VALUE key)
{
VALUE val;
rb_hash_modify_check(hash);
val = rb_hash_delete_entry(hash, key);
if (val != Qundef)
{
return val;
}
else
{
return Qnil;
}
}
The point of implementation is that rb_execution_context_t * ec
is passed as the first borrowed argument to support builtin.
Now you can call the functions defined in C from Ruby.
Finally, load the automatically generated hash.rbinc
.
Add #include" hash.rbinc "
to the bottom of hash.c
.
#include "hash.rbinc"
This completes the modification on the C code side.
Now let's implement Hash # delete
in Ruby. Create hash.rb
in the same hierarchy as hash.c
.
After creating, add the code as below.
class Hash
def delete(key)
puts "impl by Ruby(& C)!"
value = __builtin_rb_hash_delete_m(key)
if value.nil?
if block_given?
yield key
else
nil
end
else
value
end
end
end
The argument received is passed to __builtin_rb_hash_delete_m
that can be called by builtin earlier, and the result is assigned to value
.
After that, the value of value
is nil
or the process is branched in the same section. In the case of nil
If a block is passed, the block is executed with key
as an argument.
puts" impl by Ruby (& C)! "
Is a message to check when you actually try it.
This completes the builtin implementation!
Let's build it in the same way as when we built the development environment.
make -j && make install
If the build is successful, it's OK! If the build fails, check for typo etc.
Let's try Hash # delete
implemented in builtin using ʻirb`!
../install/bin/irb
Now let's paste the code below!
hash = {:key => "value"}
hash.delete(:k)
hash.delete(:key)
If the result is displayed as below, the implementation with builtin is complete!
irb(main):001:0> hash = {:key => "value"}
irb(main):002:0> hash.delete(:k)
impl by Ruby(& C)!
=> nil
irb(main):003:0> hash.delete(:key)
impl by Ruby(& C)!
=> "value"
irb(main):004:0>
Since it is displayed as ʻimpl by Ruby (& C)!, You can see that the
Hash # delete` defined in Ruby is being executed.
You have now implemented Ruby in Ruby (and C)!
By using builtin like this, you can implement Ruby itself using Ruby and (a little C) code. Therefore, I think that even people who usually write Ruby will be able to easily send patches such as method modifications.
I'm glad that it's surprisingly easy to write because I can write the process on the Ruby side after trying it.
Personally, I think that it will be easier to write Ruby extensions in C / C ++ if it can be used in Extension etc., so I am very much looking forward to the future prospects.
Write a Ruby interpreter in Ruby for Ruby 3
Complete explanation of Ruby source code
Recommended Posts