Introduction

This is Qiita's first post. Since I am almost a beginner in data analysis, I think there are many mistakes, so please point out. This time, I used Qore SDK from Qauntum Core Co., Ltd.

How to use the Qore SDK is explained in the following article. The world of reservoir computing ~ with Qore ~ Introduction of Qore SDK and detection of arrhythmia with Qore

The content of the effort will be the prediction of victory or defeat in the Soccer Premier League. Specifically, it will be a task to predict the result of the match played in 2019-2020 using the data of 2010-2018.

The dataset was downloaded from the following site. http://football-data.co.uk/englandm.php

Data preprocessing

I posted all the datasets and pre-processing on GitHub, so please refer to this. https://github.com/obameyan/QoreSDK-Premire-League

Since it is difficult to describe all of the preprocessing here, only the data before preprocessing and the data after preprocessing are described. To briefly explain what I did, I converted the data so that it could be thrown into the Qore SDK as time series data, taking into consideration various factors such as the opponent team, the result of the match, the number of goals, the number of goals scored, and the hat trick.

The following is the data before preprocessing. (Display only part)

import pandas as pd

#Original data
raw_data = pd.read_csv('./data/PremierLeague/2018-19.csv') #Only a part is described
raw_data.head()

Div	Date	HomeTeam	AwayTeam	FTHG	FTAG	FTR	HTHG	HTAG	HTR	Referee	HS	AS	HST	AST	HF	AF	HC	AC	HY	AY	B365H	B365D	B365A	BWH	BWD	BWA	IWH	IWD	IWA	PSH	PSD	PSA	WHH	WHD	WHA	VCH	VCD	VCA	Bb1X2	BbMxH	BbAvH	BbMxD	BbAvD	BbMxA	BbAvA	BbOU	BbMx>2.5	BbAv>2.5	BbMx<2.5	BbAv<2.5	BbAH	BbAHh	BbMxAHH	BbAvAHH	BbMxAHA	BbAvAHA	PSCH	PSCD	PSCA
E0	10/08/2018	Man United	Leicester	2	1	H	1	0	H	A Marriner	8	13	6	4	11	8	2	5	2	1	1.57	3.9	7.50	1.53	4.0	7.50	1.55	3.80	7.00	1.58	3.93	7.50	1.57	3.8	6.00	1.57	4.0	7.00	39	1.60	1.56	4.20	3.92	8.05	7.06	38	2.12	2.03	1.85	1.79	17	-0.75	1.75	1.70	2.29	2.21	1.55	4.07	7.69
E0	11/08/2018	Bournemouth	Cardiff	2	0	H	1	0	H	K Friend	12	10	4	1	11	9	7	4	1	1	1.90	3.6	4.50	1.90	3.4	4.40	1.90	3.50	4.10	1.89	3.63	4.58	1.91	3.5	4.00	1.87	3.6	4.75	39	1.93	1.88	3.71	3.53	4.75	4.37	38	2.05	1.98	1.92	1.83	20	-0.75	2.20	2.13	1.80	1.75	1.88	3.61	4.70
E0	11/08/2018	Fulham	Crystal Palace	0	2	A	0	1	A	M Dean	15	10	6	9	9	11	5	5	1	2	2.50	3.4	3.00	2.45	3.3	2.95	2.40	3.30	2.95	2.50	3.46	3.00	2.45	3.3	2.80	2.50	3.4	3.00	39	2.60	2.47	3.49	3.35	3.05	2.92	38	2.00	1.95	1.96	1.87	22	-0.25	2.18	2.11	1.81	1.77	2.62	3.38	2.90
E0	11/08/2018	Huddersfield	Chelsea	0	3	A	0	2	A	C Kavanagh	6	13	1	4	9	8	2	5	2	1	6.50	4.0	1.61	6.25	3.9	1.57	6.20	4.00	1.55	6.41	4.02	1.62	5.80	3.9	1.57	6.50	4.0	1.62	38	6.85	6.09	4.07	3.90	1.66	1.61	37	2.05	1.98	1.90	1.84	23	1.00	1.84	1.80	2.13	2.06	7.24	3.95	1.58
E0	11/08/2018	Newcastle	Tottenham	1	2	A	1	2	A	M Atkinson	15	15	2	5	11	12	3	5	2	2	3.90	3.5	2.04	3.80	3.5	2.00	3.70	3.35	2.05	3.83	3.57	2.08	3.80	3.2	2.05	3.90	3.4	2.10	39	4.01	3.83	3.57	3.40	2.12	2.05	38	2.10	2.01	1.88	1.81	20	0.25	2.20	2.12	1.80	1.76	4.74	3.53	1.89

Next is the data after preprocessing. (Display only part)

import pandas as pd

#It is not the original data because it is in the middle of the data
data=pd.read_csv("./data/PremierLeague/allAtt_onehot_large_train.csv") #Training data
dataT=pd.read_csv("./data/PremierLeague/allAtt_onehot_large_test.csv") #test data

#Data after preprocessing
data = data[['HTGS','ATGS','HTP','ATP','HM1','AM1', 'DiffLP','final1']]
dataT = dataT[['HTGS','ATGS','HTP','ATP','HM1','AM1','DiffLP','final1']]
df = data[200:210]

HTGS	ATGS	HTP	ATP	HM1	AM1	DiffLP	final1
0.4737	0.2568	1.3333	1.0476	3	3	1	0
0.3289	0.3784	0.9048	1.0476	1	1	-3	1
0.4342	0.3243	2.0952	1.2857	3	3	-12	1
0.4342	0.2703	1.8571	1.8095	3	3	1	0
0.3553	0.2432	1.0000	1.2857	0	1	-1	0
0.2763	0.3378	1.1905	1.1905	3	1	9	1
0.4342	0.3919	1.3810	0.9524	1	1	-2	0
0.3289	0.3378	1.0476	1.7143	1	3	2	1
0.4474	0.3784	1.1905	0.8095	3	0	-8	1
0.3816	0.3919	0.8571	1.6667	1	0	15	1

Pretreatment with Qore SDK

Here, we will perform preprocessing using the Qore SDK. Specifically, use qore_sdk.utils.sliding_window () to convert the training data dimension to (number of data, time, actual data) and the correct label dimension to (number of data, 1).

from qore_sdk.utils import sliding_window

x = np.array(data)
x_t = np.array(dataT)

x_train = np.array(x[:, :7])
x_test = np.array(x_t[:, :7])
y_train = np.array(x[:, 7])
y_test = np.array(x_t[:, 7])

X, y= sliding_window(x_train, 10, 5, axis=0, y=y_train,y_def='mode', y_axis=0)
X_test, y_test = sliding_window(x_test, 10, 5, axis=0, y=y_test,y_def='mode', y_axis=0)
print(X.shape, y.shape, X_test.shape, y_test.shape)
>>  (653, 10, 7), (653, 1), (159, 10, 7), (159, 1)

Learning and prediction using the Qore SDK

Enter the account information issued here.

from qore_sdk.client import WebQoreClient

username = '*****'
password = '*****'
endpoint = '*****'

client = WebQoreClient(username, password, endpoint=endpoint)

Let's actually learn.

client.classifier_train(X, y)
>> {'res': 'ok', 'train_time': 0.8582723140716553}

I was able to learn in an instant. Next, check the accuracy using test data.

res = client.classifier_predict(X_test)
report = classification_report(y_test, res['Y'])
print(report)

              precision    recall  f1-score   support

         0.0       0.73      0.90      0.81       104
         1.0       0.68      0.38      0.49        55

   accuracy                            0.72       159
   macro avg       0.71      0.64      0.65       159
weighted avg       0.71      0.72      0.70       159

The accuracy was 72%. To be honest, it's a subtle accuracy, but I think this is a pre-processing problem ... I thought that the data should be inflated and the correlation should be considered more carefully before preprocessing ...

Summary

I wanted to do data analysis on a daily basis, but I couldn't do anything about it, but I am grateful to Quantum Core for giving me the opportunity to perform this kind of data analysis. I would also like to take this opportunity to continue taking on the challenge of data analysis.

[PYTHON] I tried to predict the victory or defeat of the Premier League using the Qore SDK

Introduction

Data preprocessing

Pretreatment with Qore SDK

Learning and prediction using the Qore SDK

Summary