[PYTHON] How Kaggle ranking points work

How Kaggle ranking points work

kaggle_ranking.png

This article is the 14th day of the Advent Calendar 2019 on the road to the AI dojo "Kaggle" by Nikkei xTECH Business AI ① Advent Calendar 2019.

Kaggle has a ranking system in addition to the Grand master / Master / Expert / Contributor / Novice tiers, which are determined by the color and number of medals won. It seems that it is not so important compared to Tier, but I will explain how it works. (* Competition only)

Kaggle notebook https://www.kaggle.com/d1348k/learn-aboout-competition-points

github https://github.com/uratatsu/kaggle_ranking

What are Kaggle points?

In addition to gold / silver / bronze medals, Kaggle competitions also include Competition points, which give participants points when the competition is ranked. Points are calculated by the following formula.

Mechanism of points

 \Biggl[\frac{100000}{\sqrt{N_{teammates}}}\Biggl]\Bigl[Rank^{-0.75}\Bigl]\bigl[\log_{10} (1+\log_{10} (N_{teams})) \bigl]\biggl[e^{-t/500}\biggl]

First of all, 100,000 points will be distributed as basic points. From here, depending on the number of participating teams, ranking, and the number of own teams, a coefficient of 0 to 1 will be applied to determine the final points earned.

Effect of ranking

\Bigl[Rank^{-0.75}\Bigl]

Not surprisingly, the biggest impact is the ranking of the competition in private. kaggle_points_rank.png Depending on the rank, the coefficient decays as shown in the graph above.

Ranking coefficient
1st 1.0
2nd 0.5946
3rd 0.4387
10th 0.1778
50th 0.05318
100th 0.03162

The difference between 1st and 2nd place is very large, and if you are in 2nd place, you will get about 60% of the points earned in 1st place. You can get only about 18% in 10th place and 3% in 100th place.

Effect of the number of participating teams

\bigl[\log_{10} (1+\log_{10} (N_{teams})) \bigl]

This section changes depending on the number of teams participating in the competition. The larger the number of participating teams, the larger the coefficient that can be multiplied, but as you can see from the graph below, even if 10,000 people participate (the highest so far is 8802 teams), it is about 0.7. If 1,000 teams participate, it is about 0.6, so even if the number of participants increases 10 times, the points will only increase 1.16 times. kaggle_points_teams.png

Kaggle management concept is written on the official blog, but 100 team participation competition and 1,000 team participation It seems that it is based on the idea that the skills required to win in the competition will not change so much. I used to use log10 (x), so it seems that there was a 1.5 times difference between 100 teams and 1000 teams.

Effect of the number of team members

 \frac{1}{\sqrt{N_{teammates}}}

The number of teammates is multiplied by the factor calculated by the above formula.

kaggle_points_teammates.png

Two people take seven, and four people take about half. I have the impression that there is less attenuation due to the number of team members than I expected.

Number of people coefficient
1 1.0
2 0.7071
3 0.5774
4 0.5
5 0.4472
8 0.3536

Effect of elapsed days

\biggl[e^{-t/500}\biggl]

The last term is the decay over the number of days elapsed. kaggle_ranking_days.png

It will be halved in less than a year in 346 days.

Relationship between ranking and number of teams

Of these, the only things you can control are the number and ranking of your teammates. Although the points earned decrease when forming a team, in general, the ranking tends to rise when team merging is performed, so where the increase in points earned due to the higher ranking is large, the team was formed to raise the final ranking. You may get more points. It is a heat map of the relationship between the ranking and the number of teams. kaggle_ranking_heatmap.png

For example, if the 2nd place person merges with someone and becomes the 1st place, the score will increase because 59.5% → 70.7%.

Well, I don't think it's usually necessary because it's empty to merge teams while thinking about this. .. It may be important for Kaggle Ranking top 30 or the top 30 people.

Point acquisition example

def calculate_points(teammates, rank, teams, days):
    points = 100000 * 1/np.sqrt(teammates) * np.power(rank, -0.75) * np.log10(1+np.log10(teams)) * np.exp(days/500)
    return points

If you try to calculate some cases with this formula, you will get the following points.

Ranking Number of participating teams Number of teammates Medal Earned points
1 1000 1 Gold 60206
1 1000 5 Gold 26925
5 1000 1 Gold 18006
25 1000 1 Silver 5385
75 1000 1 Bronze 2362
100 1000 1 Bronze 1904

The impact of winning the solo is tremendous, and one of them is equivalent to the 32nd place in the kaggle competitions ranking (* as of December 14, 2019). By the way, Mr. bestfitting, who is currently ranked 1st in kaggle competitions ranking in the first image, has 20 solo golds (!) And 3 solo winners (!!), which is unrivaled.

in conclusion

I knew that it would change depending on the number of teams and the ranking, but it was surprisingly interesting to visualize the attenuation rate. I think there are many ways to understand how to add points, but I hope it helps you to understand the calculation method correctly.

Recommended Posts

How Kaggle ranking points work
How tuples work | Python
[For non-programmers] How to walk Kaggle