Since we finished checking the data last time, the next step is to calculate the mathematical model. (I'm currently studying, so I'd be happy if you could point out any mistakes.)
Calculate a mathematical model of experience points required to level up the DQ walk (1)
import pandas as pd
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import linear_model
sns.set()
%matplotlib inline
%precision 3
df = pd.read_csv('data.csv',names=['EXP'])
df['CUMSUM_EXP'] = df['EXP'].cumsum()
df.index = df.index + 1
df.head()
reg = linear_model.LinearRegression()
X = df.index
Y = df['EXP']
#Create a predictive model
reg.fit(X, Y)
#Regression coefficient
print(reg.coef_)
#Intercept
print(reg.intercept_)
#R2 (coefficient of determination)
print(reg.score(X, Y))
*** I get an error about the execution result of jupyter !! ***
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-69fa63dab1be> in <module>
6 Y = df['EXP']
7 #Create a predictive model
----> 8 reg.fit(X, Y)
9 #Regression coefficient
10 print(reg.coef_)
/usr/local/lib/python3.7/site-packages/sklearn/linear_model/base.py in fit(self, X, y, sample_weight)
461 n_jobs_ = self.n_jobs
462 X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 463 y_numeric=True, multi_output=True)
464
465 if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:
/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
717 ensure_min_features=ensure_min_features,
718 warn_on_dtype=warn_on_dtype,
--> 719 estimator=estimator)
720 if multi_output:
721 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
519 "Reshape your data either using array.reshape(-1, 1) if "
520 "your data has a single feature or array.reshape(1, -1) "
--> 521 "if it contains a single sample.".format(array))
522
523 # in the future np.flexible dtypes will be handled like object dtypes
ValueError: Expected 2D array, got 1D array instead:
array=[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Apparently, X wants a 2D Array. Certainly, sklearn reference also says as follows.
So, although it is a little rough, prepare a two-dimensional Array.
X = []
for i in range(1,56):
X.append([i])
It's time
reg = linear_model.LinearRegression()
Y = df['EXP']
#Create a predictive model
reg.fit(X, Y)
#Regression coefficient
print(reg.coef_)
#Intercept
print(reg.intercept_)
#R2 (coefficient of determination)
print(reg.score(X, Y))
*** Yoshika succeeded in regression analysis! *** *** (The formula is written separately in Markdown of jupyter.)
plt.plot(df.index,df['EXP'],label="EXP")
plt.plot(X,reg.predict(X),label="LinearRegression")
plt.xlabel('LEVEL')
plt.ylabel('EXP')
plt.grid(True)
This is no good ... Certainly R2 is 0.377, so it doesn't make any sense. (It's good, this is just studying.)
reg2 = linear_model.LinearRegression()
Y2 = df['CUMSUM_EXP']
#Create a predictive model
reg2.fit(X, Y2)
#Regression coefficient
print(reg2.coef_)
#Intercept(error)
print(reg2.intercept_)
#R2 (coefficient of determination)
print(reg2.score(X, Y2))
This is also completely useless ... Sure, R2 is 0.575, so it doesn't make any sense, albeit better than before. (It's good, this is just studying.)
It is clear that the linear equation is not good, so let's make it multidimensional.
First, create an explanatory variable
D1 = []
D2 = []
for i in range(1,56):
D1.append(i)
D2.append(i**2)
df_x = pd.DataFrame({"D1":D1,"D2":D2})
df_x.head()
reg3 = linear_model.LinearRegression()
X3 =df_x
Y3 = df['EXP']
#Create a predictive model
reg3.fit(X3, Y3)
#Regression coefficient
print(reg3.coef_)
#Intercept(error)
print(reg3.intercept_)
#R2 (coefficient of determination)
print(reg3.score(X3, Y3))
plt.plot(df.index,df['EXP'],label="EXP")
plt.plot(X,reg3.predict(X3),label="LinearRegression")
plt.xlabel('LEVEL')
plt.ylabel('EXP')
plt.grid(True)
R2 is 0.644, which is better than before, but it is hard to say that it is still Fit.
reg4 = linear_model.LinearRegression()
X4 =df_x
Y4 = df['CUMSUM_EXP']
#Create a predictive model
reg4.fit(X4, Y4)
#Regression coefficient
print(reg4.coef_)
#Intercept(error)
print(reg4.intercept_)
#R2 (coefficient of determination)
print(reg4.score(X4, Y4))
plt.plot(df.index,df['CUMSUM_EXP'],label="CUMSUM_EXP")
plt.plot(X,reg4.predict(X4),label="LinearRegression")
plt.xlabel('LEVEL')
plt.ylabel('EXP')
plt.grid(True)
R2 is 0.860, so it looks pretty good, isn't it?
The program is the same, so only the graph results.
The cumulative one has been quite fit. R2 is also up to 0.9733.
Isn't it fair to say that both are almost fit? R2 requires 0.961 experience points and 0.987 cumulative experience points to reach the next level.
Let's calculate the cumulative experience value required if the current maximum level 55 rises to 99 as in the original.
The calculated model formula is here
If you visualize this
** You will need nearly 140 million XP (1,392,549,526) to reach level 99. (It's a prediction, a prediction.) ** By the way, the experience value required to reach level 55 is 3,441,626 (about 3.5 million), so it is necessary to reach level 55 404 times.
66,311 for metal hoimin, 132,623 for stray metal, 904,252 for metal slime. Hmmm, we need to overfish the numbers that WWF is likely to move on.
Game Over
Recommended Posts