This is Qiita's first post. Since I am almost a beginner in data analysis, I think there are many mistakes, so please point out. This time, I used Qore SDK from Qauntum Core Co., Ltd.
How to use the Qore SDK is explained in the following article. The world of reservoir computing ~ with Qore ~ Introduction of Qore SDK and detection of arrhythmia with Qore
The content of the effort will be the prediction of victory or defeat in the Soccer Premier League. Specifically, it will be a task to predict the result of the match played in 2019-2020 using the data of 2010-2018.
The dataset was downloaded from the following site. http://football-data.co.uk/englandm.php
I posted all the datasets and pre-processing on GitHub, so please refer to this. https://github.com/obameyan/QoreSDK-Premire-League
Since it is difficult to describe all of the preprocessing here, only the data before preprocessing and the data after preprocessing are described. To briefly explain what I did, I converted the data so that it could be thrown into the Qore SDK as time series data, taking into consideration various factors such as the opponent team, the result of the match, the number of goals, the number of goals scored, and the hat trick.
The following is the data before preprocessing. (Display only part)
import pandas as pd
#Original data
raw_data = pd.read_csv('./data/PremierLeague/2018-19.csv') #Only a part is described
raw_data.head()
Div | Date | HomeTeam | AwayTeam | FTHG | FTAG | FTR | HTHG | HTAG | HTR | Referee | HS | AS | HST | AST | HF | AF | HC | AC | HY | AY | HR | AR | B365H | B365D | B365A | BWH | BWD | BWA | IWH | IWD | IWA | PSH | PSD | PSA | WHH | WHD | WHA | VCH | VCD | VCA | Bb1X2 | BbMxH | BbAvH | BbMxD | BbAvD | BbMxA | BbAvA | BbOU | BbMx>2.5 | BbAv>2.5 | BbMx<2.5 | BbAv<2.5 | BbAH | BbAHh | BbMxAHH | BbAvAHH | BbMxAHA | BbAvAHA | PSCH | PSCD | PSCA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E0 | 10/08/2018 | Man United | Leicester | 2 | 1 | H | 1 | 0 | H | A Marriner | 8 | 13 | 6 | 4 | 11 | 8 | 2 | 5 | 2 | 1 | 0 | 0 | 1.57 | 3.9 | 7.50 | 1.53 | 4.0 | 7.50 | 1.55 | 3.80 | 7.00 | 1.58 | 3.93 | 7.50 | 1.57 | 3.8 | 6.00 | 1.57 | 4.0 | 7.00 | 39 | 1.60 | 1.56 | 4.20 | 3.92 | 8.05 | 7.06 | 38 | 2.12 | 2.03 | 1.85 | 1.79 | 17 | -0.75 | 1.75 | 1.70 | 2.29 | 2.21 | 1.55 | 4.07 | 7.69 |
E0 | 11/08/2018 | Bournemouth | Cardiff | 2 | 0 | H | 1 | 0 | H | K Friend | 12 | 10 | 4 | 1 | 11 | 9 | 7 | 4 | 1 | 1 | 0 | 0 | 1.90 | 3.6 | 4.50 | 1.90 | 3.4 | 4.40 | 1.90 | 3.50 | 4.10 | 1.89 | 3.63 | 4.58 | 1.91 | 3.5 | 4.00 | 1.87 | 3.6 | 4.75 | 39 | 1.93 | 1.88 | 3.71 | 3.53 | 4.75 | 4.37 | 38 | 2.05 | 1.98 | 1.92 | 1.83 | 20 | -0.75 | 2.20 | 2.13 | 1.80 | 1.75 | 1.88 | 3.61 | 4.70 |
E0 | 11/08/2018 | Fulham | Crystal Palace | 0 | 2 | A | 0 | 1 | A | M Dean | 15 | 10 | 6 | 9 | 9 | 11 | 5 | 5 | 1 | 2 | 0 | 0 | 2.50 | 3.4 | 3.00 | 2.45 | 3.3 | 2.95 | 2.40 | 3.30 | 2.95 | 2.50 | 3.46 | 3.00 | 2.45 | 3.3 | 2.80 | 2.50 | 3.4 | 3.00 | 39 | 2.60 | 2.47 | 3.49 | 3.35 | 3.05 | 2.92 | 38 | 2.00 | 1.95 | 1.96 | 1.87 | 22 | -0.25 | 2.18 | 2.11 | 1.81 | 1.77 | 2.62 | 3.38 | 2.90 |
E0 | 11/08/2018 | Huddersfield | Chelsea | 0 | 3 | A | 0 | 2 | A | C Kavanagh | 6 | 13 | 1 | 4 | 9 | 8 | 2 | 5 | 2 | 1 | 0 | 0 | 6.50 | 4.0 | 1.61 | 6.25 | 3.9 | 1.57 | 6.20 | 4.00 | 1.55 | 6.41 | 4.02 | 1.62 | 5.80 | 3.9 | 1.57 | 6.50 | 4.0 | 1.62 | 38 | 6.85 | 6.09 | 4.07 | 3.90 | 1.66 | 1.61 | 37 | 2.05 | 1.98 | 1.90 | 1.84 | 23 | 1.00 | 1.84 | 1.80 | 2.13 | 2.06 | 7.24 | 3.95 | 1.58 |
E0 | 11/08/2018 | Newcastle | Tottenham | 1 | 2 | A | 1 | 2 | A | M Atkinson | 15 | 15 | 2 | 5 | 11 | 12 | 3 | 5 | 2 | 2 | 0 | 0 | 3.90 | 3.5 | 2.04 | 3.80 | 3.5 | 2.00 | 3.70 | 3.35 | 2.05 | 3.83 | 3.57 | 2.08 | 3.80 | 3.2 | 2.05 | 3.90 | 3.4 | 2.10 | 39 | 4.01 | 3.83 | 3.57 | 3.40 | 2.12 | 2.05 | 38 | 2.10 | 2.01 | 1.88 | 1.81 | 20 | 0.25 | 2.20 | 2.12 | 1.80 | 1.76 | 4.74 | 3.53 | 1.89 |
Next is the data after preprocessing. (Display only part)
import pandas as pd
#It is not the original data because it is in the middle of the data
data=pd.read_csv("./data/PremierLeague/allAtt_onehot_large_train.csv") #Training data
dataT=pd.read_csv("./data/PremierLeague/allAtt_onehot_large_test.csv") #test data
#Data after preprocessing
data = data[['HTGS','ATGS','HTP','ATP','HM1','AM1', 'DiffLP','final1']]
dataT = dataT[['HTGS','ATGS','HTP','ATP','HM1','AM1','DiffLP','final1']]
df = data[200:210]
HTGS | ATGS | HTP | ATP | HM1 | AM1 | DiffLP | final1 |
---|---|---|---|---|---|---|---|
0.4737 | 0.2568 | 1.3333 | 1.0476 | 3 | 3 | 1 | 0 |
0.3289 | 0.3784 | 0.9048 | 1.0476 | 1 | 1 | -3 | 1 |
0.4342 | 0.3243 | 2.0952 | 1.2857 | 3 | 3 | -12 | 1 |
0.4342 | 0.2703 | 1.8571 | 1.8095 | 3 | 3 | 1 | 0 |
0.3553 | 0.2432 | 1.0000 | 1.2857 | 0 | 1 | -1 | 0 |
0.2763 | 0.3378 | 1.1905 | 1.1905 | 3 | 1 | 9 | 1 |
0.4342 | 0.3919 | 1.3810 | 0.9524 | 1 | 1 | -2 | 0 |
0.3289 | 0.3378 | 1.0476 | 1.7143 | 1 | 3 | 2 | 1 |
0.4474 | 0.3784 | 1.1905 | 0.8095 | 3 | 0 | -8 | 1 |
0.3816 | 0.3919 | 0.8571 | 1.6667 | 1 | 0 | 15 | 1 |
Here, we will perform preprocessing using the Qore SDK.
Specifically, use qore_sdk.utils.sliding_window ()
to convert the training data dimension to (number of data, time, actual data) and the correct label dimension to (number of data, 1).
from qore_sdk.utils import sliding_window
x = np.array(data)
x_t = np.array(dataT)
x_train = np.array(x[:, :7])
x_test = np.array(x_t[:, :7])
y_train = np.array(x[:, 7])
y_test = np.array(x_t[:, 7])
X, y= sliding_window(x_train, 10, 5, axis=0, y=y_train,y_def='mode', y_axis=0)
X_test, y_test = sliding_window(x_test, 10, 5, axis=0, y=y_test,y_def='mode', y_axis=0)
print(X.shape, y.shape, X_test.shape, y_test.shape)
>> (653, 10, 7), (653, 1), (159, 10, 7), (159, 1)
Enter the account information issued here.
from qore_sdk.client import WebQoreClient
username = '*****'
password = '*****'
endpoint = '*****'
client = WebQoreClient(username, password, endpoint=endpoint)
Let's actually learn.
client.classifier_train(X, y)
>> {'res': 'ok', 'train_time': 0.8582723140716553}
I was able to learn in an instant. Next, check the accuracy using test data.
res = client.classifier_predict(X_test)
report = classification_report(y_test, res['Y'])
print(report)
precision recall f1-score support
0.0 0.73 0.90 0.81 104
1.0 0.68 0.38 0.49 55
accuracy 0.72 159
macro avg 0.71 0.64 0.65 159
weighted avg 0.71 0.72 0.70 159
The accuracy was 72%. To be honest, it's a subtle accuracy, but I think this is a pre-processing problem ... I thought that the data should be inflated and the correlation should be considered more carefully before preprocessing ...
I wanted to do data analysis on a daily basis, but I couldn't do anything about it, but I am grateful to Quantum Core for giving me the opportunity to perform this kind of data analysis. I would also like to take this opportunity to continue taking on the challenge of data analysis.
Recommended Posts