[PYTHON] A story about displaying article-linked ads on Jubatus

Introduction

Speaking of which, I participated in Jubatus Hackathon # 1 last October.

http://connpass.com/event/8233/

The slides at the time of the announcement are neatly summarized here, so please refer to them.

http://blog.jubat.us/2014/10/jubatus.html

Even though I took it from planning to implementation in one day, I regret that it was a lot of work. So I'll add a little.

What I made

Shortly before the hackathon, I went to one of the largest advertising events in Japan. Inspired by that, let's make an advertising guy! When I told a junior of the same team, I got an OK, so I made it.

Specific specifications and assumptions are as follows:

――I have so-called owned media (article) and want to display advertisements that match it. ――Matching means that the content of the article and the content of the advertisement are similar. -(I don't want to use my head too much because it's quite hard after work)

Especially important is the bottom condition, which has a very large weight.

If you give a little serious reason, there is a self-admonition to make the structure as simple as possible if you get lost. This is because complex systems have higher maintenance costs. Especially when using machine learning, if there is a bug (or unpleasant behavior) in the system

--Middleware bug --Correct behavior by algorithm --Lack of learning data --Inappropriate features --Insufficient tuning

There are many things to consider, such as, so I don't want to work anymore (serious). Basically, engineers do automation to make things easier, so if you have more jobs, you'll be overwhelmed (?).

By the way, the code probably won't work as it is, but it's on github.

https://github.com/chase0213/jubatus-hackathon-01

Click here for the slides at the time of the announcement.

http://www.slideshare.net/chisatohasegawa370/jubatus-hackathon-1hiyoshi

System construction

As you may have noticed on github, we were initially trying to get a Rails + Elasticsearch + Jubatus 3 server configuration. However, the Elasticsearch part was not essential and was rejected. I wanted to use it. .. ..

The flow of the request is as follows.

--Owned media (article) is posted on Rails server --Ads are in json format (including title and ad summary), learned on jubatus --Users enter appropriate search keywords to search for articles --The Rails server sends the content of the narrowed down articles to the backend Jubatus (received by python + Django) by POST. --The Jubatus server that receives the request uses recommender to return the id of the advertorial article that is close to the content of the article to the Rails server. --The Rails server gets the ad based on the id received from the Jubatus server and displays it to the user

So after all, you're just digging into the recommender.

Then, the place to judge the similarity between the article and the advertisement is morphologically analyzed using mecab + ipadic and held as a feature using Jubatus' vector converter.

http://jubat.us/ja/fv_convert.html

This converter is so good that it doesn't do much to knead the features. Therefore, it is covered with noise and the accuracy is not so high, but this matter will be described later.

demo

I deployed it on AWS and demonstrated it. I just used a machine with reasonable specs, so I've stopped it now (for financial reasons). If you mess with the code properly, it will work locally, so please mess with it.

Concept

In the learning part of learning, where advertisements can be added dynamically, the taste of the learning device was not so good. Usually, many people imagine that machine learning gets smarter every time it is used, so I will briefly comment on that part (I will not write any code, not bad).

First of all, when it comes to using a learner in an advertising system, I think it's about making a difference in the impression rate between what was actually clicked and what wasn't. This means that you'll have more impressions for the most clicked article / ad combination, and whimsical for those that don't.

I haven't actually made it, but I think it can be done using a classifier. Specifically, take the UNION of the content of the article narrowed down by the search system and the content of the clicked advertisement, and plunge it into the classifier with the label that it was clicked. Similarly, if you don't click, you'll save the article and ad UNION with the label that it wasn't clicked, but one caveat here.

Even if a link isn't clicked, it's premature to determine that the user isn't interested in it.
Because, in many cases, users are offered many other options,
It did not actively indicate that he was not interested in the link.

I made it look like a quote, but it's my personal opinion. On the flip side, clicked information should be treated more strongly than non-clicked information.

So, if you tune this area statistically well and dig into the classfier for binary classification, you can predict whether the user is likely to click on that combination. If you return that value to the upper row (recommender) and reflect it well as a coefficient, more combinations will be displayed that are clicked a lot.

At the end

The last paragraph is a desk theory, so I think the pitfalls are scattered all over the place. Then I described it as "noisy" at the top, but this should be tolerated to some extent. That's because ads that are completely noise-free can be boring to users (ads are non-professional and should not be mentioned too deeply).

Recommended Posts

A story about displaying article-linked ads on Jubatus
A story about a 503 error on Heroku open
A story about running Python on PHP on Heroku
A story about a Linux beginner putting Linux on a Windows tablet
A story about a GCP beginner building a Minecraft server on GCE
A refreshing story about Python's Slice
A sloppy story about Python's Slice
A story about using Python's reduce
A story about wanting to think about garbled characters on GAE / P
A story about trying to run JavaScripthon on Windows and giving up.
A story about remodeling Lubuntu into a Chromebook
A story about creating an anonymous channel on Slack from zero knowledge
A story about Python pop and append
A story about deploying a Twitter-linked app created using Flask + gunicorn on Heroku
A story about trying to install uwsgi on an EC2 instance and failing
A story about building an IDE environment with WinPython on an old Windows OS.
A addictive story when using tensorflow on Android
A story about simple machine learning using TensorFlow
A story about operating a GCP instance from Discord
A story about Go's global variables and scope
A story about implementing a login screen with django
A story about modifying Python and adding functions
A story about data analysis by machine learning
A story about an engineer who came only on the server side created a portfolio
A story about making 3D space recognition with Python
A story about using Resona's software token with 1Password
A story about predicting exchange rates with Deep Learning
A story about migrating entire Linux disk via SSH
A story about making Hanon-like sheet music with Python
A story about trying a (Golang +) Python monorepo with Bazel
A story about kindergartens, nursery schools, and children's gardens
SoC FPGA: A small story when using on Linux
A story about reflecting Discord activity in Slack Status
A story about struggling to loop 3 million ID data
A story about how theano was moved with TSUBAME 2.0
A story about changing the master name of BlueZ
A story about a Linux beginner passing LPIC101 in a week
A swampy story when using firebase on AWS lamda
A story about stumbling through PATH after installing anaconda
A story about trying to use cron on a Raspberry Pi and getting stuck in space