[PYTHON] The real cause of the Jubatus client spitting out a timeout and dropping an error

Hello everyone

I wanted to do machine learning for a while, so I tried out the affordable jubatus. When I tried the tutorial, it was relatively easy to do, but I found myself addicted to it, so I will share the cause and countermeasures.

jubatus

http://jubat.us/ja/index.html

It seems to be a distributed processing framework for online machine learning. I was wondering if the number of machine learning tools made in Japan has increased recently, but this seems to have been around for quite some time. There are plenty of Japanese documents, so I'll try using them! That's why I tried it.

Installation

The installation just followed the quick start. Also, as usual, I decided to dive into the virtual environment.

First of all, as a virtual environment, ubuntu's trusty64 box is officially provided, so I will use this.

vagrant init ubuntu/trusty64; vagrant up --provider virtualbox

Now that you have a virtual environment, you can immediately go inside and do the installation work. http://jubat.us/ja/quickstart.html

Let's use python as the client of jubatus. Because it's included in most Linux

$ sudo vim /etc/apt/sources.list.d/jubatus.listv #Add repository
$ cat /etc/apt/sources.list.d/jubatus.list
deb http://download.jubat.us/apt binary/

$ sudo apt-get update
$ sudo apt-get install jubatus #jubatus is installed

$ sudo apt-get install python-pip
$ sudo pip install jubatus #The python jubatus client is installed

$ source /opt/jubatus/profile #Because the configuration file is read every time~/.bash_It's a good idea to dig into your profile

Run the tutorial

Now that the installation is complete without any problems, let's move on to the tutorial. Early ubuntu doesn't seem to have git, so on the host

git clone https://github.com/jubatus/jubatus-tutorial-python.git
cd jubatus-tutorial-python

The dropped repository will be the working directory as it is, so I will insert the test data here.

$ wget http://qwone.com/~jason/20Newsgroups//20news-bydate.tar.gz
$ tar xvzf 20news-bydate.tar.gz

It's quite heavy, so be patient until the download is complete. After downloading, start jubatus immediately.

jubaclassifier --configpath config.json

It seems that the jubatus server has started. Finally, let's start the client and throw the test data into jubatus.

An error. .. ..

python tutorial.py

Now the client starts. .. ..

msgpackrpc.error.RPCError: Request timed out

It spits out a timeout error and stops I wonder if this is just a simple timeout. .. ..

starting load from /tmp/10.0.2.15_9199_classifier_tutorial.jubatus
Killed

The server is down with an error like this. A simple timeout should not stop the server.

So, I tried to verify it a little more, but it seems that the server stopped first and then the client stopped. In the first place, it seems that it is being killed from the OS side because it was suddenly killed without any notice. In that case, you can look at syslog as well.

$ tail /var/log/syslog
Feb  8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.828429] [ 9500]  1000  9497   128839    93956     234        0             0 jubaclassifier
Feb  8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.828431] [ 9501]  1000  9501    12649     2238      29        0             0 python
Feb  8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.828432] Out of memory: Kill process 9500 (jubaclassifier) score 750 or sacrifice child
Feb  8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.829547] Killed process 9500 (jubaclassifier) total-vm:515356kB, anon-rss:375824kB, file-rss:0kB

Out of memory !! It's the one that was messed up with MySQL I didn't know why the process file couldn't be found, but when I looked at the log, I couldn't start the engine due to lack of memory. .. ..

Then, it seems that you should increase the memory If it is AWS, I will make a swap, but this time it is vagrant so I can easily increase it

   config.vm.provider "virtualbox" do |vb|
  #   # Display the VirtualBox GUI when booting the machine
  #   vb.gui = true
  #
  #   # Customize the amount of memory on the VM:
     vb.memory = "2048"
   end

As expected, if you have 2GB of memory, it will work.

And the result is

OK,sci.med, sci.med, 0.584435760975
NG,sci.electronics, soc.religion.christian, 0.37116548419
OK,talk.politics.guns, talk.politics.guns, 1.49401080608

▽
OK,rec.sport.hockey, rec.sport.hockey, 1.0699224472
===================
OK: 5398
NG: 2134

It worked well.

Summary

If you want to kill with out of memory, you can write out of memory. .. .. Well, I wonder if it's okay to publish it in syslog. .. ..

For the time being, if the server is down instead of the client, you should check the syslog.

reference

Jubatus: Distributed Processing Framework for Online Machine Learning Jubatus timed out error fits instead of timed out error (Someone had the same problem a day ago ...)

Recommended Posts

The real cause of the Jubatus client spitting out a timeout and dropping an error
The story of an error in PyOCR
What is the cause of the following error?
A discussion of the strengths and weaknesses of Python
Cause a buffer overflow and execute an arbitrary function.
A story of trying out pyenv, virtualenv and virtualenvwrapper
I checked out the versions of Blender and Python
Create a shape on the trajectory of an object
Check the argument type annotation when executing a function in Python and make an error
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future