This kind of thing is being held, so I'm trying to participate for a while.
https://www.kaggle.com/c/coupon-purchase-prediction http://www.recruit.jp/news_data/release/2015/0716_15946.html
Kaggle is said to be "The world's largest community of data scientists compete to solve your most valuable problems." Simply put, it's a place where you can easily run a data analysis competition. I'm sure there are people who explain it more properly when I search the internet.
It's more fun to compete about 100 times more than working alone, and you get a lot of good results. That's why it's quite busy.
Data analysis competition? What does that mean? That's right. I will get a prize. This can be a five-digit dollar or a lot of money. Well, the opponent is a professional in the world, but it is not easy to win. .. ..
RECRUIT Challenge? As I wrote at the beginning, RECRUIT Holdings had a problem with Kaggle, and that was the RECRUIT Challenge. The content is the purchase forecast of the coupon site "Pompare". Predict the next week's purchase from the browsing / purchase coupon information for the past year or so.
As with any problem, the prize money is huge. (Https://www.kaggle.com/c/coupon-purchase-prediction/details/prizes)
――First place $ 30,000 ――Second place $ 10,000 --Third place $ 5,000
Moreover, if you are a student, you will receive additional money and rights. (Http://challenge.recruit.ai/studentAward.html) Since the application is in Japanese, is it actually a match between Japanese students?
--1st place 100,000 yen --Second place 50,000 yen --Third place 30,000 yen -+ You can talk with a well-known teacher in the AI field
That is.
Let's find out. It's sad to just get started and stumble (it's a secret that I made a format error 4 times), so I wrote a code that randomly outputs 10 predictions using pandas. Oh, let's assume the csv file is in the dat folder.
random_prediciton.py
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
ul = pd.read_csv('./dat/user_list.csv')
cl_test = pd.read_csv('./dat/coupon_list_test.csv')
sampler = np.random.permutation(len(cl_test)) #cl_Generate something that changes the order of test
cids = cl_test.take(sampler[:10]).COUPON_ID_hash #Get 10 coupons in random order and COUPON_ID_Take hash
cids = " ".join(cids) #Collect the IDs taken by separating them with a space.(This is the required output format)
output = pd.DataFrame({"USER_ID_hash":ul.USER_ID_hash, "PURCHASED_COUPONS":cids}, columns=["USER_ID_hash", "PURCHASED_COUPONS"]) #Create a DataFrame for output
output.to_csv("./output_random.csv", index=False)
Information on what kind of information is provided is provided, but I simply wrote it in pptx.