How to call Python or Julia from Ruby (experimental implementation)

I am making a gem called virtual_module that can call Python and Julia packages from Ruby. In the example below, the part of reading the manpage of some commands as a document is written in Ruby, and the part to be processed by doc2vec is called Python and left to gensim.

doc2vec.rb


require 'natto'
manpages={}
natto = Natto::MeCab.new
%w"ps ls cat cd top df du touch mkdir".each do |cmd|
  list = []
  natto.parse(`man #{cmd} | col -bx | cat`) do |n|
    list << n.surface
  end
  manpages[cmd] = list
end

require 'virtual_module'
py = VirtualModule.new(:methods=><<EOS, :python=>["gensim"])
class LabeledListSentence(object):
    def __init__(self, words_list, label_list):
        self.words_list = words_list
        self.label_list = label_list

    def __iter__(self):
        for i, words in enumerate(self.words_list):
            yield gensim.models.doc2vec.LabeledSentence(words, [self.label_list[i]])

EOS
model = py.gensim.models.doc2vec.Doc2Vec(py.LabeledListSentence(manpages.values, manpages.keys), min_count:0)
p model.docvecs.most_similar(["ps"]) # [["top", 0.5594387054443359], ["cat", 0.46929454803466797], ["df", 0.3900265693664551], ["mkdir", 0.38811227679252625], ["du", 0.23663029074668884], ["ls", 0.15436093509197235], ["cd", -0.1965409815311432], ["touch", -0.38958919048309326]]

I used this to add a function to extract related articles using doc2vec that runs on Ruby to my blog (made by Sinatra), but it was a little convenient. I don't know how many people will be happy other than me, but (although it's quite inconvenient) scikit-learn will also be available, and I think it's interesting depending on how you use it, so write down the expected usage etc. To go.

In addition to doc2vec, examples of using scikit-learn are summarized in Personal blog. -from-ruby /) So if you are interested, please check it out.

A brief introduction to the operation of the Virtual Module using the REPL

Here, I will use REPL to write how the Virtual Module works internally. It is assumed that the following is already installed on your system:

--virtual_module gem (v0.3.0 or higher) --Python execution environment

First, launch irb.

debussy:~ remore$ irb -r virtual_module
irb(main):001:0> po = VirtualModule.new(:python=>["sklearn"=>"datasets"])
=> #<Module:0x007fb7e1aee818>

Calling VirtualModule # new launches a Python (or Julia) process behind the scenes. When the background job finishes successfully launching, VirtualModule returns a new Module instance. From now on, we will communicate with the background via this Module instance (≒ this instance behaves like a proxy). For convenience, we'll call this a proxy object.

irb(main):002:0> py.int(2.3)
=> 2
irb(main):003:0> po.unknown_method(2.3)
RuntimeError: An error occurred while executing the command in python process: ,name 'unknown_method' is not defined

The behavior of proxy objects is very simple. In the above example, the proxy object receives a method call called ʻint (2.3) and passes it to the background job as is (at this time, msgpack is used to convert the value). As a result, the Fixnum type value 2is output to the terminal, which is returned from the background job. Since data conversion is only using msgpack, the values that can be converted to each other also conform to the [msgpack specifications](https://github.com/msgpack/msgpack/blob/master/spec.md). If an undefined method is called on the background job side, as in thepo.unknown_method (2.3)` example, an error will be displayed. Basically, the above is all the operation of Virtual Module.

I think that there are some places where this alone does not make sense, so I will add a little more.

irb(main):004:0> po.datasets
=> #<Module:0x007ffd0906c030>
irb(main):005:0> po.datasets.load_iris(:_)
=> #<Module:0x007ffd09074500>
irb(main):006:0> po.datasets.load_iris(:_).vclass
=> "<class 'sklearn.datasets.base.Bunch'>"
irb(main):007:0> po.datasets.load_iris(:_).data[1].to_a
=> [4.9, 3.0, 1.4, 0.2]

See this example to see how it works when values that cannot actually be converted by msgpack are used. In this example, the proxy object (the local variable po here) first returns a new proxy object (# <Module: 0x007ffd0906c030> ) to the call to the # datasets method, but after that #load_iris (: _) is also returning another proxy object ( # <Module: 0x007ffd09074500>). Since datasets is a module type object on Python, and load_iris (: _) is an instance of the 'sklearn.datasets.base.Bunch' class, neither can be converted via msgpack, so the Module instance is Has been generated. For calls that cannot be converted by mspgack in this way, the virtualModule does not pass the actual value, but only a pointer to that value.

irb(main):008:0> po.datasets.vclass
=> "<type 'module'>"
irb(main):009:0> iris = po.datasets.load_iris(:_)
=> #<Module:0x007ffd09057568>
irb(main):010:0> iris.target.vclass
=> "<type 'numpy.ndarray'>"
irb(main):011:0> iris.target.vmethods
=> ["T", "__abs__", "__add__", "__and__", "__array__", "__array_finalize__", "__array_interface__", "__array_prepare__", "__array_priority__", "__array_struct__", "__array_wrap__", "__class__", "__contains__", "__copy__", "__deepcopy__", "__delattr__", "__delitem__", "__delslice__", "__div__", "__divmod__", "__doc__", "__eq__", "__float__", "__floordiv__", "__format__", "__ge__", "__getattribute__", "__getitem__", "__getslice__", "__gt__", "__hash__", "__hex__", "__iadd__", "__iand__", "__idiv__", "__ifloordiv__", "__ilshift__", "__imod__", "__imul__", "__index__", "__init__", "__int__", "__invert__", "__ior__", "__ipow__", "__irshift__", "__isub__", "__iter__", "__itruediv__", "__ixor__", "__le__", "__len__", "__long__", "__lshift__", "__lt__", "__mod__", "__mul__", "__ne__", "__neg__", "__new__", "__nonzero__", "__oct__", "__or__", "__pos__", "__pow__", "__radd__", "__rand__", "__rdiv__", "__rdivmod__", "__reduce__", "__reduce_ex__", "__repr__", "__rfloordiv__", "__rlshift__", "__rmod__", "__rmul__", "__ror__", "__rpow__", "__rrshift__", "__rshift__", "__rsub__", "__rtruediv__", "__rxor__", "__setattr__", "__setitem__", "__setslice__", "__setstate__", "__sizeof__", "__str__", "__sub__", "__subclasshook__", "__truediv__", "__xor__", "all", "any", "argmax", "argmin", "argpartition", "argsort", "astype", "base", "byteswap", "choose", "clip", "compress", "conj", "conjugate", "copy", "ctypes", "cumprod", "cumsum", "data", "diagonal", "dot", "dtype", "dump", "dumps", "fill", "flags", "flat", "flatten", "getfield", "imag", "item", "itemset", "itemsize", "max", "mean", "min", "nbytes", "ndim", "newbyteorder", "nonzero", "partition", "prod", "ptp", "put", "ravel", "real", "repeat", "reshape", "resize", "round", "searchsorted", "setfield", "setflags", "shape", "size", "sort", "squeeze", "std", "strides", "sum", "swapaxes", "take", "tobytes", "tofile", "tolist", "tostring", "trace", "transpose", "var", "view"]
irb(main):012:0> iris.target.to_a
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

In Ruby, you can get information on various states of an object with ʻObject # class and ʻObject # methods, but VirtualModule follows this with similar methods (# vclass and # vmethods. ) Is prepared. As you might imagine, # vclass visits the background job for the type of the value and returns it, and # vmethods` returns the methods available for that object.

That's all for the explanation so far, but if you want to see more examples, I have some other examples on GitHub You can refer to it in / tree / master / example). It's an experimental implementation, so I think it's difficult to use in many places, but if anyone wants to use it, I'd be happy if you could tell me what you think about it.

Recommended Posts

How to call Python or Julia from Ruby (experimental implementation)
MessagePack-Call Python (or Python to Ruby) methods from Ruby using RPC
[Python] How to call a c function from python (ctypes)
Call Matlab from Python to optimize
Call popcount from Ruby / Python / C #
How to access wikipedia from python
How to call PyTorch in Julia
Offline real-time how to write E11 ruby and python implementation example
How to update Google Sheets from Python
How to access RDS from Lambda (python)
How to enjoy programming with Minecraft (Ruby, Python)
How to open a web browser from python
[Python] Convert from DICOM to PNG or CSV
[Python] How to read data from CIFAR-10 and CIFAR-100
How to generate a Python object from JSON
An easy way to call Java from Python
How to install Python
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Python from or import
[Python] How to remove duplicate values from the list
How to scrape image data from flickr with python
How to measure processing time in Python or Java
How to download files from Selenium in Python in Chrome
Execute Python function from Powershell (how to pass arguments)
python, php, ruby How to convert decimal numbers to n-ary numbers
How to create a kubernetes pod from python code
How to handle JSON in Ruby, Python, JavaScript, PHP
A mechanism to call a Ruby method from Python that can be done in 200 lines
Call C language functions from Python to exchange multidimensional arrays
Call CPLEX from Python (DO cplex)
Post from Python to Slack
How to install Python [Windows]
python3: How to use bottle (2)
Offline real-time how to write Python implementation example of E14
Cheating from PHP to Python
How to slice a block multiple array from a multiple array in Python
How to run a Python program from within a shell script
How to use the __call__ method in a Python class
Don't lose to Ruby! How to run Python (Django) on Heroku
How to call a function
How to update Python Tkinter to 8.6
How to launch AWS Batch from a python client app
How to connect to various DBs from Python (PEP 249) and SQLAlchemy
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
How to use Python argparse
How to sample from any probability density function in Python
[Python] How to use checkio
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
Switch from python2.7 to python3.6 (centos7)
How to run Notepad ++ Python
Connect to sqlite from python
How to change Python version
[python] How to judge scalar
[Python] How to use input ()
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
Let's use Watson from Python! --How to use Developer Cloud Python SDK
python3: How to use bottle
Go language to see and remember Part 8 Call GO language from Python