Introduction

As the title says, I tried to implement Ruby itself in Ruby (and C) using Ruby's builtin (borrowed from the calling method because I do not know the official name).

The content is for those who are interested in implementing Ruby itself.

What is builtin?

builtin is to implement Ruby itself in Ruby (and C) (Isn't there a formal name so far?). You can implement Ruby more easily using Ruby and C by calling __builtin_ <function name defined in C> from Ruby code as shown below.

For example, Hash # delete is implemented in C as follows:

static VALUE
rb_hash_delete_m(VALUE hash, VALUE key)
{
    VALUE val;

    rb_hash_modify_check(hash);
    val = rb_hash_delete_entry(hash, key);

    if (val != Qundef) {
	return val;
    }
    else {
	if (rb_block_given_p()) {
	    return rb_yield(key);
	}
	else {
	    return Qnil;
	}
    }
}

The first argument hash receives the hash itself as an argument, and the second argument key receives the key passed in Hash # delete. By the way, values such as variables on the Ruby side are received as VALUE type and processed by C functions.

    rb_hash_modify_check(hash);

The rb_hash_modify_check function internally executes the rb_check_frozen function to see if the hash is frozen.

static void
rb_hash_modify_check(VALUE hash)
{
    rb_check_frozen(hash); //Check if the object is frozen
}

In val = rb_hash_delete_entry (hash, key);, the value to be deleted is acquired based on the key received in the argument, and the deletion is performed at the same time. If there is no value to pair with the key, the undefined value used in C called Qundef will be entered.

    if (val != Qundef) {
	return val;
    }
    else {
	if (rb_block_given_p()) {
	    return rb_yield(key);
	}
	else {
	    return Qnil;
	}
    }

The process is branched by the value of val, and if it is not Qundef (that is, if the value can be obtained by using the key and can be deleted), the deleted value is returned. If it is Qundef, it returns Qnil ( nil in Ruby). If a block is passed, it executes rb_yield (key) and returns the result.

As you can see, the Ruby you usually use is implemented using C.

By using the builtin function, the above code will be as follows.

class Hash
    def delete(key)
        value = __builtin_rb_hash_delete_m(key)

        if value.nil?
            if block_given?
                yield key
            else
                nil
            end
        else
            value
        end
    end
end

static VALUE
rb_hash_delete_m(rb_execution_context_t *ec, VALUE hash, VALUE key)
{
    VALUE val;

    rb_hash_modify_check(hash);
    val = rb_hash_delete_entry(hash, key);

    if (val != Qundef) {
	    return val;
    }
    else {
	    return Qnil;
    }
}

Because the execution of blocks is processed on the Ruby side. I think the implementation in C is simpler and easier to read. Also

By using the builtin function like this, you can implement Ruby with Ruby and a little C code.

Also, it seems that there are cases where performance improves when implemented in Ruby rather than in C. For a more specific story, Mr. Sasada talked at RubyKaigi 2019, so please refer to that.

Write a Ruby interpreter in Ruby for Ruby 3

I tried it

I found that I could implement a method using Ruby code by using builtin, so I actually tried it.

Development environment construction

I started by creating a Ruby development environment. I used WSL + Ubuntu 18.04 as my environment and built a development environment. As a basic procedure, I proceeded by referring to (2) MRI source code structure of Ruby Hack Challenge.

First, install the libraries to be used.

sudo apt install git ruby autoconf bison gcc make zlib1g-dev libffi-dev libreadline-dev libgdbm-dev libssl-dev

Then create a working directory and change to it. ..

mkdir workdir
cd workdir

After moving to the working directory, clone the Ruby source code. It takes a lot of time, so it's a good idea to make coffee during this time.

git clone https://github.com/ruby/ruby.git

After cloning the source code, go to the ruby directory and run ʻautoconf. This is to generate a configurescript that will be executed later. After execution, it will return toworkdir`.

cd ruby
autoconf
cd ..

Then create a directory for your build and change to it.

mkdir build
cd build

Run ../ruby/configure --prefix = $ PWD / ../ install --enable-shared to create a Makefile to build. Also, --prefix = $ PWD / ../ install specifies where to install Ruby.

../ruby/configure --prefix=$PWD/../install --enable-shared

Then run make -j to build. -j is an option to compile in parallel. If you're not in a hurry, just make is fine.

make -j

Finally, run make install to create a ʻinstall directory inside the workdir` directory and install Ruby.

make install

The latest Ruby is now installed in workdir / install.

By the way, if you are wondering if it is really installed, try running ../install/bin/ruby -v. If you see ruby 2.8.0 dev and the version of Ruby, then Ruby is installed correctly.

Try redefining the method with builtin

Now that the development environment is in place, we will use builtin to redefine the methods. We will reimplement the Hash # delete mentioned in the example earlier.

Fix common.mk

First, add various settings to common.mk to use the Ruby source code when building. There is a description of BUILTIN_RB_SRCS around the 1000th line of common.mk. Add a file that contains the Ruby code to be read by this BUILTIN_RB_SRCS.

`common.mk`


BUILTIN_RB_SRCS = \
		$(srcdir)/ast.rb \
		$(srcdir)/gc.rb \
		$(srcdir)/io.rb \
		$(srcdir)/pack.rb \
		$(srcdir)/trace_point.rb \
		$(srcdir)/warning.rb \
		$(srcdir)/array.rb \
		$(srcdir)/prelude.rb \
		$(srcdir)/gem_prelude.rb \
		$(empty)
BUILTIN_RB_INCS = $(BUILTIN_RB_SRCS:.rb=.rbinc)

This time, add hash.rb as follows to implement Hash.

BUILTIN_RB_SRCS = \
		$(srcdir)/ast.rb \
		$(srcdir)/gc.rb \
		$(srcdir)/io.rb \
		$(srcdir)/pack.rb \
		$(srcdir)/trace_point.rb \
		$(srcdir)/warning.rb \
		$(srcdir)/array.rb \
		$(srcdir)/prelude.rb \
		$(srcdir)/gem_prelude.rb \
+		$(srcdir)/hash.rb \
		$(empty)
BUILTIN_RB_INCS = $(BUILTIN_RB_SRCS:.rb=.rbinc)

Next, modify the part that specifies the file to be read in the Hash build around line 2520. In this way, the file to be read such as hash.c is specified.

`common.mk`


hash.$(OBJEXT): {$(VPATH)}hash.c
hash.$(OBJEXT): {$(VPATH)}id.h
hash.$(OBJEXT): {$(VPATH)}id_table.h
hash.$(OBJEXT): {$(VPATH)}intern.h
hash.$(OBJEXT): {$(VPATH)}internal.h
hash.$(OBJEXT): {$(VPATH)}missing.h

Add hash.rbinc and builtin.h here.

hash.$(OBJEXT): {$(VPATH)}hash.c
+hash.$(OBJEXT): {$(VPATH)}hash.rbinc
+hash.$(OBJEXT): {$(VPATH)}builtin.h
hash.$(OBJEXT): {$(VPATH)}id.h
hash.$(OBJEXT): {$(VPATH)}id_table.h
hash.$(OBJEXT): {$(VPATH)}intern.h
hash.$(OBJEXT): {$(VPATH)}internal.h
hash.$(OBJEXT): {$(VPATH)}missing.h

hash.rbinc is a file that is automatically generated when make is executed, and is generated based on the contents of __builtin_ <function name of C to be called> checked in hash.rb. Also, builtin.h is a header file with implementations for using builtin.

This completes the modification in common.mk.

Modification of inits.c

Then modify ʻinits.c`. However, it is very easy to fix.

`inits.c`


#define BUILTIN(n) CALL(builtin_##n)
    BUILTIN(gc);
    BUILTIN(io);
    BUILTIN(ast);
    BUILTIN(trace_point);
    BUILTIN(pack);
    BUILTIN(warning);
    BUILTIN(array);
    Init_builtin_prelude();
}

ʻInits.cadds the Ruby source file that uses builtin as above. AddBUILTIN (hash);` here in the same way.

#define BUILTIN(n) CALL(builtin_##n)
    BUILTIN(gc);
    BUILTIN(io);
    BUILTIN(ast);
    BUILTIN(trace_point);
    BUILTIN(pack);
    BUILTIN(warning);
    BUILTIN(array);
+    BUILTIN(hash);
    Init_builtin_prelude();

This is OK to modify ʻinits.c`.

Modify hash.c

Finally, we will modify the code in hash.c.

Load builtin.h

First, add #include" builtin.h " to the header reading part around the 40th line.

  #include "ruby/st.h"
  #include "ruby/util.h"
  #include "ruby_assert.h"
  #include "symbol.h"
  #include "transient_heap.h"
+ #include "builtin.h"

Now you can use the structures etc. required for builtin in hash.c.

Delete the definition of Hash # delete

Next, remove the part that defines Hash # delete.

I think a function called ʻInit_Hash (void)is defined at the bottom ofhash.c`.

void
Init_Hash(void)
{
 ///The implementation code of Hash etc. is written.
}

The methods of each Ruby class are defined in this function as follows.

rb_define_method(rb_cHash, "delete", rb_hash_delete_m, 1);

Think of rb_define_method as the same as a method definition in Ruby. Pass the VALUE of the class that defines the method as the first argument, and the second argument is the method name. The third argument is the function defined in C (the process executed by the method), and the fourth argument is the number of arguments received by the method.

If you want to define a Ruby method with builtin, you need to delete this definition part. This time we will reimplement Hash # delete, so delete the part where delete is defined.

    rb_define_method(rb_cHash, "shift", rb_hash_shift, 0);
-   rb_define_method(rb_cHash, "delete", rb_hash_delete_m, 1);
    rb_define_method(rb_cHash, "delete_if", rb_hash_delete_if, 0);

Fixed rb_hash_delete_m to be available from builtin

Modify the rb_hash_delete_m called by rb_define_method (rb_cHash, "delete", rb_hash_delete_m, 1); that you deleted earlier so that it can be used in builtin.

There is an implementation of rb_hash_delete_m around line 2380.

static VALUE
rb_hash_delete_m(VALUE hash, VALUE key)
{
    VALUE val;

    rb_hash_modify_check(hash);
    val = rb_hash_delete_entry(hash, key);

    if (val != Qundef) {
	return val;
    }
    else {
	if (rb_block_given_p()) {
	    return rb_yield(key);
	}
	else {
	    return Qnil;
	}
    }
}

Modify this as follows.

static VALUE
rb_hash_delete_m(rb_execution_context_t *ec, VALUE hash, VALUE key)
{
    VALUE val;

    rb_hash_modify_check(hash);
    val = rb_hash_delete_entry(hash, key);

    if (val != Qundef)
    {
        return val;
    }
    else
    {
        return Qnil;
    }
}

The point of implementation is that rb_execution_context_t * ec is passed as the first borrowed argument to support builtin.

Now you can call the functions defined in C from Ruby.

Load hash.rbinc

Finally, load the automatically generated hash.rbinc. Add #include" hash.rbinc " to the bottom of hash.c.

#include "hash.rbinc"

This completes the modification on the C code side.

Creating hash.rb

Now let's implement Hash # delete in Ruby. Create hash.rb in the same hierarchy as hash.c. After creating, add the code as below.

class Hash
    def delete(key)
        puts "impl by Ruby(& C)!"
        value = __builtin_rb_hash_delete_m(key)

        if value.nil?
            if block_given?
                yield key
            else
                nil
            end
        else
            value
        end
    end
end

The argument received is passed to __builtin_rb_hash_delete_m that can be called by builtin earlier, and the result is assigned to value.

After that, the value of value is nil or the process is branched in the same section. In the case of nil If a block is passed, the block is executed with key as an argument.

puts" impl by Ruby (& C)! " Is a message to check when you actually try it.

This completes the builtin implementation!

Try to build

Let's build it in the same way as when we built the development environment.

make -j && make install

If the build is successful, it's OK! If the build fails, check for typo etc.

Actually try with irb

Let's try Hash # delete implemented in builtin using ʻirb`!

../install/bin/irb

Now let's paste the code below!

hash = {:key => "value"}
hash.delete(:k)
hash.delete(:key)

If the result is displayed as below, the implementation with builtin is complete!

irb(main):001:0> hash = {:key => "value"}
irb(main):002:0> hash.delete(:k)
impl by Ruby(& C)!
=> nil
irb(main):003:0> hash.delete(:key)
impl by Ruby(& C)!
=> "value"
irb(main):004:0>

Since it is displayed as ʻimpl by Ruby (& C)!, You can see that the Hash # delete` defined in Ruby is being executed.

You have now implemented Ruby in Ruby (and C)!

At the end

By using builtin like this, you can implement Ruby itself using Ruby and (a little C) code. Therefore, I think that even people who usually write Ruby will be able to easily send patches such as method modifications.

I'm glad that it's surprisingly easy to write because I can write the process on the Ruby side after trying it.

Personally, I think that it will be easier to write Ruby extensions in C / C ++ if it can be used in Extension etc., so I am very much looking forward to the future prospects.

reference

Write a Ruby interpreter in Ruby for Ruby 3

Complete explanation of Ruby source code

How Ruby works

I implemented Ruby with Ruby (and C) (I played with builtin)

Introduction

What is builtin?

I tried it

Development environment construction

Try redefining the method with builtin

Fix common.mk

common.mk

common.mk

Modification of inits.c

inits.c

Modify hash.c

Load builtin.h

Delete the definition of Hash # delete

Fixed rb_hash_delete_m to be available from builtin

Load hash.rbinc

Creating hash.rb

Try to build

Actually try with irb

At the end

reference

`common.mk`

`common.mk`

`inits.c`