This is the article on the 15th day of Machine learning and applied technologies other than deep learning by QuantumCore Advent Calendar 2019.
Predict the success of NBA players using time-series information.
This time, we will predict the "points per entry" of Donovan Mitchell, who belongs to Utah Jazz.
Donovan Mitchell (Basketball Reference)
By the way, I am a beginner in basketball.
** "Isn't the ups and downs of activity in the season the same every season?" **
For example, "I wasn't doing well at the beginning of the season, but I got back on track in the middle of the season and reached the peak of my activity at the end of the season."
It's a pretty rough assumption, but I tried to see how much I could do with this and the Qore SDK.
Scraping the Basketball Reference to get the Box Score of the players in each match.
The Bakertball Reference has match data from around 1954, and I scraped all the data for the time being. If you are interested in the content, please contact us.
--Explanatory variable: Number of days since the opening
--Objective variable: Number of points per entry (PTS / MP
)
The scores per Donovan Mitchell entry until mid-November are as follows.
This time, we will train using the data of the 18/19 season and validate it by the middle of the 19/20 season (although we will not change the model). After that, we will predict Donovan's success in the actual game.
from qore_sdk.client import WebQoreClient
client = WebQoreClient(username="", password="p@$$w0rd", endpoint="")
Suppose Donovan.csv
has the results of a scraped player.
import pandas as pd
donovan = pd.read_csv('Donovan.csv', parse_dates=['date'], index_col='date')
(Reappearance) </ b>
--Explanatory variable: Number of days since the opening
--Objective variable: Number of points per entry (PTS / MP
)
#The standard is October 1, before the start of the season.
#The opening day may be better
X = donovan[(datetime.datetime(year=2019, month=10, day=1)>donovan.index) & (donovan.index>datetime.datetime(year=2018, month=10, day=1))].index
X = X - datetime.datetime(year=2018, month=10, day=1)
y = donovan[donovan.index.year==2018]['PTS per mn']
Data split
import sklearn.model_selection as model_selection
X_train, X_valid, y_train, y_valid = model_selection.train_test_split(
X, y, shuffle=False, random_state=44
)
Learning and prediction
# Train
res = client.regression_train(X=X_train.values, Y=y_train.values)
#Forecast
pred = client.regression_predict(X=X_valid.values)
Validation (confirmation of the name)
import sklearn.metrics as metrics
rmse = metrics.mean_squared_error(y_valid, pred["Y"])**.5
mae = metrics.mean_absolute_error(y_valid, pred["Y"])
print("RMSE=", rmse)
print("MAE=", mae)
print("RMSE/MAE=", rmse/mae)
HTTP Error: HTTP Error 500: INTERNAL SERVER ERROR
occurred at the time of learning. We will update the article as soon as it is resolved.
--The assumption is too simple --Maybe factors other than time should be considered ――In the first place, it is strange that the axis of performance evaluation is "the number of points per participation". ――First, let's get the result.
Learning and forecasting with different data sets is certainly easy and easy. I thought that more information should be known about tools such as Reservoir Computing that have advantages but can easily reproduce complex calculations.
On the other hand, it may give the impression of a black box, such as First day article of operation and [Documentation](https://qcore-info.github. You need to understand the explanations such as io / advent-calendar-2019 / index.html # qore_sdk.client.WebQoreClient.classifier_train). In particular, I felt that it would be okay if the documentation had a background / theory explanation page as in the article on the first day.
In any case, I would like to pay tribute to everyone at Quantum Core who devotes their time to developing such tools.
Recommended Posts