[PYTHON] RECRUIT Challenge @ kaggle

This kind of thing is being held, so I'm trying to participate for a while.

https://www.kaggle.com/c/coupon-purchase-prediction http://www.recruit.jp/news_data/release/2015/0716_15946.html

What is Kaggle?

Kaggle is said to be "The world's largest community of data scientists compete to solve your most valuable problems." Simply put, it's a place where you can easily run a data analysis competition. I'm sure there are people who explain it more properly when I search the internet.

It's more fun to compete about 100 times more than working alone, and you get a lot of good results. That's why it's quite busy.

Data analysis competition? What does that mean? That's right. I will get a prize. This can be a five-digit dollar or a lot of money. Well, the opponent is a professional in the world, but it is not easy to win. .. ..

RECRUIT Challenge? As I wrote at the beginning, RECRUIT Holdings had a problem with Kaggle, and that was the RECRUIT Challenge. The content is the purchase forecast of the coupon site "Pompare". Predict the next week's purchase from the browsing / purchase coupon information for the past year or so.

As with any problem, the prize money is huge. (Https://www.kaggle.com/c/coupon-purchase-prediction/details/prizes)

――First place $ 30,000 ――Second place $ 10,000 --Third place $ 5,000

Moreover, if you are a student, you will receive additional money and rights. (Http://challenge.recruit.ai/studentAward.html) Since the application is in Japanese, is it actually a match between Japanese students?

--1st place 100,000 yen --Second place 50,000 yen --Third place 30,000 yen -+ You can talk with a well-known teacher in the AI field

That is.

Yeah, I want money!

Let's find out. It's sad to just get started and stumble (it's a secret that I made a format error 4 times), so I wrote a code that randomly outputs 10 predictions using pandas. Oh, let's assume the csv file is in the dat folder.

random_prediciton.py


# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np


ul = pd.read_csv('./dat/user_list.csv')
cl_test = pd.read_csv('./dat/coupon_list_test.csv')

sampler = np.random.permutation(len(cl_test)) #cl_Generate something that changes the order of test
cids = cl_test.take(sampler[:10]).COUPON_ID_hash #Get 10 coupons in random order and COUPON_ID_Take hash
cids = " ".join(cids) #Collect the IDs taken by separating them with a space.(This is the required output format)
output = pd.DataFrame({"USER_ID_hash":ul.USER_ID_hash, "PURCHASED_COUPONS":cids}, columns=["USER_ID_hash", "PURCHASED_COUPONS"]) #Create a DataFrame for output
output.to_csv("./output_random.csv", index=False) 

What kind of information do you have?

Information on what kind of information is provided is provided, but I simply wrote it in pptx. recruit.png

Recommended Posts

RECRUIT Challenge @ kaggle
Challenge Kaggle Titanic
Challenge Kaggle [House Prices]
Challenge AtCoder
Can you challenge Kaggle with just your iPad?
Local public information university students challenge Kaggle (memorial)