Hello everyone
I wanted to do machine learning for a while, so I tried out the affordable jubatus. When I tried the tutorial, it was relatively easy to do, but I found myself addicted to it, so I will share the cause and countermeasures.
jubatus
http://jubat.us/ja/index.html
It seems to be a distributed processing framework for online machine learning. I was wondering if the number of machine learning tools made in Japan has increased recently, but this seems to have been around for quite some time. There are plenty of Japanese documents, so I'll try using them! That's why I tried it.
The installation just followed the quick start. Also, as usual, I decided to dive into the virtual environment.
First of all, as a virtual environment, ubuntu's trusty64 box is officially provided, so I will use this.
vagrant init ubuntu/trusty64; vagrant up --provider virtualbox
Now that you have a virtual environment, you can immediately go inside and do the installation work. http://jubat.us/ja/quickstart.html
Let's use python as the client of jubatus. Because it's included in most Linux
$ sudo vim /etc/apt/sources.list.d/jubatus.listv #Add repository
$ cat /etc/apt/sources.list.d/jubatus.list
deb http://download.jubat.us/apt binary/
$ sudo apt-get update
$ sudo apt-get install jubatus #jubatus is installed
$ sudo apt-get install python-pip
$ sudo pip install jubatus #The python jubatus client is installed
$ source /opt/jubatus/profile #Because the configuration file is read every time~/.bash_It's a good idea to dig into your profile
Now that the installation is complete without any problems, let's move on to the tutorial. Early ubuntu doesn't seem to have git, so on the host
git clone https://github.com/jubatus/jubatus-tutorial-python.git
cd jubatus-tutorial-python
The dropped repository will be the working directory as it is, so I will insert the test data here.
$ wget http://qwone.com/~jason/20Newsgroups//20news-bydate.tar.gz
$ tar xvzf 20news-bydate.tar.gz
It's quite heavy, so be patient until the download is complete. After downloading, start jubatus immediately.
jubaclassifier --configpath config.json
It seems that the jubatus server has started. Finally, let's start the client and throw the test data into jubatus.
python tutorial.py
Now the client starts. .. ..
msgpackrpc.error.RPCError: Request timed out
It spits out a timeout error and stops I wonder if this is just a simple timeout. .. ..
starting load from /tmp/10.0.2.15_9199_classifier_tutorial.jubatus
Killed
The server is down with an error like this. A simple timeout should not stop the server.
So, I tried to verify it a little more, but it seems that the server stopped first and then the client stopped. In the first place, it seems that it is being killed from the OS side because it was suddenly killed without any notice. In that case, you can look at syslog as well.
$ tail /var/log/syslog
Feb 8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.828429] [ 9500] 1000 9497 128839 93956 234 0 0 jubaclassifier
Feb 8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.828431] [ 9501] 1000 9501 12649 2238 29 0 0 python
Feb 8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.828432] Out of memory: Kill process 9500 (jubaclassifier) score 750 or sacrifice child
Feb 8 12:29:49 vagrant-ubuntu-trusty-64 kernel: [56821.829547] Killed process 9500 (jubaclassifier) total-vm:515356kB, anon-rss:375824kB, file-rss:0kB
Out of memory !! It's the one that was messed up with MySQL I didn't know why the process file couldn't be found, but when I looked at the log, I couldn't start the engine due to lack of memory. .. ..
Then, it seems that you should increase the memory If it is AWS, I will make a swap, but this time it is vagrant so I can easily increase it
config.vm.provider "virtualbox" do |vb|
# # Display the VirtualBox GUI when booting the machine
# vb.gui = true
#
# # Customize the amount of memory on the VM:
vb.memory = "2048"
end
As expected, if you have 2GB of memory, it will work.
And the result is
OK,sci.med, sci.med, 0.584435760975
NG,sci.electronics, soc.religion.christian, 0.37116548419
OK,talk.politics.guns, talk.politics.guns, 1.49401080608
▽
OK,rec.sport.hockey, rec.sport.hockey, 1.0699224472
===================
OK: 5398
NG: 2134
It worked well.
If you want to kill with out of memory, you can write out of memory. .. .. Well, I wonder if it's okay to publish it in syslog. .. ..
For the time being, if the server is down instead of the client, you should check the syslog.
Jubatus: Distributed Processing Framework for Online Machine Learning Jubatus timed out error fits instead of timed out error (Someone had the same problem a day ago ...)
Recommended Posts