Ich habe Kaggles Hausverkauf in King County, USA, verwendet, um ein Trainingsmodell mit maschinellem Lernen von XGboost zu generieren und dieses Trainingsmodell mit Flask in einen API-Server umzuwandeln. Dieser API-Server für maschinelles Lernen besteht aus vier Hauptschritten. Um die Daten von House Sails zu verstehen, wird zunächst eine explorative EDA-Datenanalyse (Explanatory Data Analysis) durchgeführt, um den Status der Daten zu verstehen. Als nächstes wird eine Vorverarbeitung durchgeführt, so dass die Daten durch maschinelles Lernen trainiert werden. Als nächstes wird maschinelles Lernen verwendet, um ein Lernmodell zu generieren. Dieses Mal verwende ich XG Boost. Schließlich werden wir den API-Server mit Flas implementieren.
Bibliotheken wie Anaconda, XGBoost, joblib, Flask und flask-cors sind installiert.
Zunächst können Sie den erforderlichen Labrari- und den House Sails-Datensatz laden, den Sie von Kaggle heruntergeladen haben. Sie können die Anzahl der anzuzeigenden Spalten angeben, indem Sie set_option angeben.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', 4)
df = pd.read_csv('house_sales/kc_house_data.csv')
df.head()
id | date | ... | sqft_living15 | sqft_lot15 | |
---|---|---|---|---|---|
0 | 7129300520 | 20141013T000000 | ... | 1340 | 5650 |
1 | 6414100192 | 20141209T000000 | ... | 1690 | 7639 |
2 | 5631500400 | 20150225T000000 | ... | 2720 | 8062 |
3 | 2487200875 | 20141209T000000 | ... | 1360 | 5000 |
4 | 1954400510 | 20150218T000000 | ... | 1800 | 7503 |
5 rows × 21 columns
Zunächst wird von den Features das Histogramm von sqft_living angezeigt. Wenn Sie sich das ansehen, können Sie sehen, dass es von der Form der Standardnormalverteilung abweicht, weil es einen großen Wert hat.
plt.figure(figsize = (12,8))
plt.hist(df["sqft_living"])
plt.savefig('House_Sales_Explanatory Data Analysis_hist01.png', bbox_inches='tight')
Lassen Sie uns ein Histogramm für den Preis anzeigen, das der diesmal vorhergesagte Wert ist. Nach wie vor gibt es große Werte und Ausreißer, daher basiert die Form auf der linken Seite.
plt.figure(figsize = (12,8))
plt.hist(df["price"])
plt.savefig('House_Sales_Explanatory Data Analysis_hist02.png', bbox_inches='tight')
Diese Funktion verarbeitet einen Quadrantenbereich. Dadurch werden die Ausreißerdaten gelöscht.
def outlier_iqr(df, columns = None):
if columns == None:
columns = df.columns
for col in columns:
q1 = df[col].describe()['25%']
q3 = df[col].describe()['75%']
iqr = q3 - q1
outlier_min = q1 - iqr * 1.5
outlier_max = q3 + iqr * 1.5
df = df[(df[col] >= outlier_min) & (df[col] <= outlier_max)]
return df
df_1 = outlier_iqr(df, columns = ['price'])
df_1.shape
(20454, 21)
Lassen Sie uns das Preishistogramm, das im Quadrantenbereich verarbeitet wurde, erneut anzeigen. Dieses Mal wurden die Ausreißer entfernt, sodass das Histogramm eine Form hat, die einer Normalverteilung nahe kommt. Wenn Sie die Form der Daten mit Form überprüfen, können Sie feststellen, dass die Anzahl der Daten gegenüber dem vorherigen Zeitpunkt leicht abgenommen hat, jedoch nicht wesentlich abgenommen hat.
plt.figure(figsize = (12,8))
plt.hist(df_1["price"])
plt.savefig('House_Sales_Explanatory Data Analysis_hist03.png', bbox_inches='tight')
df.shape
(21613, 21)
Lassen Sie uns vorerst die Form des Histogramms anderer Daten überprüfen. Sie können sehen, dass die Form des Histogramms im Allgemeinen gut ausbalanciert ist.
fig, axes = plt.subplots(2,3, figsize = (18, 12))
axes.ravel()[0].hist(df_1["sqft_living"])
axes.ravel()[1].hist(df_1["sqft_above"])
axes.ravel()[2].hist(df_1["sqft_basement"])
axes.ravel()[3].hist(df_1["lat"])
axes.ravel()[4].hist(df_1["long"])
axes.ravel()[5].hist(df_1["sqft_living15"])
axes.ravel()[0].set_title("sqft_living")
axes.ravel()[1].set_title("sqft_above")
axes.ravel()[2].set_title("sqft_basement")
axes.ravel()[3].set_title("lat")
axes.ravel()[4].set_title("long")
axes.ravel()[5].set_title("sqft_living15")
plt.savefig('House_Sales_Explanatory Data Analysis_hist04.png', bbox_inches='tight')
Hier löschen wir die Funktionen, die während des Lernens nicht mehr benötigt werden. Hier werden ID, Datum, sqft_lot, sqft_lot15 und Postleitzahl gelöscht.
df_1 = df_1.drop(columns = ['id', 'date', 'sqft_lot','sqft_lot15','zipcode'])
Die Merkmalsmenge yr_built ist das Baujahr, daher ist das Baujahr in den Daten enthalten. Bei dieser Rate ist es schwierig, sie als Trainingsdaten zu verarbeiten. Daher haben wir einen neuen Funktionsbetrag hinzugefügt, der auf dem Alter des Gebäudes basiert. Außerdem befindet sich yr_renovated ebenfalls im Jahr der Renovierung, sodass dies auch in die Anzahl der Jahre seit der Renovierung geändert wird.
df_1["age"] = 2020 - df_1["yr_built"]
df_1.loc[(df_1['yr_renovated'] == 0), 'yr_renovated'] = 2020
Numerische Daten werden hier mit StandardScaler standardisiert.
from sklearn.preprocessing import StandardScaler
num_feature = ['sqft_living', 'sqft_above', 'sqft_basement', 'lat', 'long', 'sqft_living15']
for col in num_feature:
scaler = StandardScaler()
df_1[col] = scaler.fit_transform(np.array(df_1[col].values).reshape(-1, 1))
Zur Zeit überprüfe ich das Histogramm der numerischen Daten erneut. Sie können sehen, dass die Form mit der zuvor angezeigten übereinstimmt.
fig, axes = plt.subplots(2,3, figsize = (18, 12))
axes.ravel()[0].hist(df_1["sqft_living"])
axes.ravel()[1].hist(df_1["sqft_above"])
axes.ravel()[2].hist(df_1["sqft_basement"])
axes.ravel()[3].hist(df_1["lat"])
axes.ravel()[4].hist(df_1["long"])
axes.ravel()[5].hist(df_1["sqft_living15"])
axes.ravel()[0].set_title("sqft_living")
axes.ravel()[1].set_title("sqft_above")
axes.ravel()[2].set_title("sqft_basement")
axes.ravel()[3].set_title("lat")
axes.ravel()[4].set_title("long")
axes.ravel()[5].set_title("sqft_living15")
plt.savefig('House_Sales_Explanatory Data Analysis_hist05.png', bbox_inches='tight')
Speichern Sie es schließlich in CSV als Daten für maschinelles Lernen.
df_price = df_1["price"]
df_price.to_csv('House_Sales_Explanatory_Price.csv')
Unnötige Daten werden für Trainingsdaten gelöscht, und kategoriale Daten werden von get_dummies in Dummy-Variablen konvertiert und als CSV gespeichert.
df_1 = df_1.drop(columns = ['price', 'yr_built', 'yr_renovated'])
df_1 = pd.get_dummies(df_1, columns = ['bedrooms', 'bathrooms', 'floors', 'waterfront', 'view', 'condition', 'grade', 'age', 'renovated_age'], drop_first = True)
df_1.to_csv('House_Sales_Explanatory_Preprocessing.csv')
Hier werden wir maschinelles Lernen unter Verwendung der vorverarbeiteten Daten durchführen, die zuvor durchgeführt wurden. Daher ist es besser, es als separates Projekt zu behalten.
Ich werde die für diesen Prozess erforderliche Bibliothek erneut importieren.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
Laden Sie die vorverarbeitete CSV-Datei, die Sie zuvor als Pandas gespeichert haben. Es gab einen unnötigen Funktionsumfang namens Unbenannt, daher werde ich ihn löschen. Die Anzahl der von set_option angezeigten Features beträgt vier.
pd.set_option('display.max_columns', 4)
df = pd.read_csv('House_Sales_Explanatory_Preprocessing.csv')
df = df.drop(columns = ['Unnamed: 0'])
df.head()
sqft_living | sqft_above | ... | renovated_age_80 | renovated_age_86 |
---|---|---|---|---|
0 | -1.026685 | -0.725963 | ... | 0 |
1 | 0.769106 | 0.635702 | ... | 0 |
2 | -1.556379 | -1.289885 | ... | 0 |
3 | -0.018975 | -0.904768 | ... | 0 |
4 | -0.380717 | -0.038253 | ... | 0 |
Der Preis, der die richtigen Antwortdaten darstellt, kann auf die gleiche Weise gelesen werden.
price | |
---|---|
0 | 221900.0 |
1 | 538000.0 |
2 | 180000.0 |
3 | 604000.0 |
4 | 510000.0 |
df_price = pd.read_csv('House_Sales_Explanatory_Price.csv', header=None, names=['price'])
df_price.head()
Bei diesem maschinellen Lernen gibt es aufgrund vorheriger Schulungen einen großen Unterschied in der Lerngenauigkeit zwischen dem Lernmodell, das alle vorverarbeiteten Daten verwendet, und dem Lernmodell, das Daten verwendet, bei denen weniger wichtige Merkmale gelöscht wurden. Da ich es nicht verstanden habe, habe ich beschlossen, nur die folgenden Funktionen zu verwenden. Grundsätzlich werden alle numerischen Merkmale verwendet, und für kategoriale Merkmale bleibt nur die Note übrig, und andere kategoriale Daten werden gelöscht. Der Hauptgrund ist, dass wir erwartet haben, dass es mühsam sein würde, ein Front-End für mehrere kategoriale Daten zu implementieren, wenn schließlich eine Anwendung unter Verwendung eines maschinellen Lernmodells entwickelt wird.
df = df[["sqft_living","sqft_above","sqft_basement","lat","long","sqft_living15","grade_3","grade_4","grade_5","grade_6","grade_7"
,"grade_8","grade_9","grade_10","grade_11","grade_12"]]
sqft_living | sqft_above | ... | grade_11 | grade_12 | |
---|---|---|---|---|---|
0 | -1.026685 | -0.725963 | ... | 0 | 0 |
1 | 0.769106 | 0.635702 | ... | 0 | 0 |
2 | -1.556379 | -1.289885 | ... | 0 | 0 |
3 | -0.018975 | -0.904768 | ... | 0 | 0 |
4 | -0.380717 | -0.038253 | ... | 0 | 0 |
5 rows × 16 columns
Hier wird XGboost für maschinelles Lernen importiert. Der Salonzähler bleibt fast die Standardeinstellung.
import xgboost as xgb
X_train, X_test, y_train, y_test = train_test_split(df, df_price, random_state = 0)
params = {
'silent' : 1,
'max_depth' : 6,
'min_chiled_weight' : 1,
'eta' : 0.1,
'tree_method' : 'exact',
'objective' : 'reg:linear',
'eval_metric' : 'rmse',
'predictor' : 'cpu_predictor'
}
dtrain = xgb.DMatrix(X_train, label = y_train)
dtest = xgb.DMatrix(X_test, label = y_test)
model = xgb.train(params = params,
dtrain = dtrain,
num_boost_round = 200,
early_stopping_rounds = 10,
evals = [(dtest, 'test')])
[0] test-rmse:471544
Will train until test-rmse hasn't improved in 10 rounds.
[1] test-rmse:427350
[2] test-rmse:387757
[3] test-rmse:352314
[4] test-rmse:320602
[5] test-rmse:292132
[6] test-rmse:266667
[7] test-rmse:244148
[8] test-rmse:223983
[9] test-rmse:206046
[10] test-rmse:190112
[11] test-rmse:176111
[12] test-rmse:163754
[13] test-rmse:152820
[14] test-rmse:143269
[15] test-rmse:134879
[16] test-rmse:127772
[17] test-rmse:121362
[18] test-rmse:115939
[19] test-rmse:111405
[20] test-rmse:107280
[21] test-rmse:103750
[22] test-rmse:100928
[23] test-rmse:98446.5
[24] test-rmse:96280.3
[25] test-rmse:94419.2
[26] test-rmse:92933.6
[27] test-rmse:91644.1
[28] test-rmse:90581.3
[29] test-rmse:89422.8
[30] test-rmse:88575.7
[31] test-rmse:88038.8
[32] test-rmse:87254.6
[33] test-rmse:86857.1
[34] test-rmse:86527.8
[35] test-rmse:86238.3
[36] test-rmse:85950
[37] test-rmse:85705
[38] test-rmse:85532.4
[39] test-rmse:85346.7
[40] test-rmse:85204.1
[41] test-rmse:85058.9
[42] test-rmse:84926.7
[43] test-rmse:84845.4
[44] test-rmse:84671.9
[45] test-rmse:84539.6
[46] test-rmse:84380.6
[47] test-rmse:84287.2
[48] test-rmse:84254.7
[49] test-rmse:84168.9
[50] test-rmse:84106.6
[51] test-rmse:83858.5
[52] test-rmse:83829.8
[53] test-rmse:83809.5
[54] test-rmse:83726
[55] test-rmse:83704.2
[56] test-rmse:83650.4
[57] test-rmse:83422.6
[58] test-rmse:83405.8
[59] test-rmse:83281
[60] test-rmse:83293.6
[61] test-rmse:83289.4
[62] test-rmse:83251.9
[63] test-rmse:83237.5
[64] test-rmse:83055.6
[65] test-rmse:83051.9
[66] test-rmse:82938.8
[67] test-rmse:82932.7
[68] test-rmse:82933.2
[69] test-rmse:82859
[70] test-rmse:82829.6
[71] test-rmse:82840.5
[72] test-rmse:82823
[73] test-rmse:82827.4
[74] test-rmse:82834.6
[75] test-rmse:82845.9
[76] test-rmse:82839.4
[77] test-rmse:82828.5
[78] test-rmse:82829.7
[79] test-rmse:82651.8
[80] test-rmse:82660
[81] test-rmse:82637.3
[82] test-rmse:82514.6
[83] test-rmse:82497.6
[84] test-rmse:82484.7
[85] test-rmse:82486.3
[86] test-rmse:82486.8
[87] test-rmse:82496
[88] test-rmse:82491.4
[89] test-rmse:82486.6
[90] test-rmse:82290.3
[91] test-rmse:82265.1
[92] test-rmse:82261.5
[93] test-rmse:82236.5
[94] test-rmse:82236.4
[95] test-rmse:82111.9
[96] test-rmse:82111.1
[97] test-rmse:82111.3
[98] test-rmse:82108
[99] test-rmse:82097.1
[100] test-rmse:82077.4
[101] test-rmse:82041.9
[102] test-rmse:82040
[103] test-rmse:82042.6
[104] test-rmse:82044.2
[105] test-rmse:82033.7
[106] test-rmse:82041.1
[107] test-rmse:82028.4
[108] test-rmse:82030.7
[109] test-rmse:82036.4
[110] test-rmse:82028.6
[111] test-rmse:82020.3
[112] test-rmse:82025.5
[113] test-rmse:82024.9
[114] test-rmse:82034
[115] test-rmse:82025.2
[116] test-rmse:81957.5
[117] test-rmse:81950.9
[118] test-rmse:81959.8
[119] test-rmse:81936.7
[120] test-rmse:81935.9
[121] test-rmse:81937
[122] test-rmse:81945.8
[123] test-rmse:81894.8
[124] test-rmse:81885.2
[125] test-rmse:81899.3
[126] test-rmse:81877
[127] test-rmse:81875.7
[128] test-rmse:81859.6
[129] test-rmse:81849.7
[130] test-rmse:81851.2
[131] test-rmse:81839.4
[132] test-rmse:81850.8
[133] test-rmse:81846
[134] test-rmse:81836.2
[135] test-rmse:81827.2
[136] test-rmse:81832.3
[137] test-rmse:81859.6
[138] test-rmse:81856.6
[139] test-rmse:81850
[140] test-rmse:81847.6
[141] test-rmse:81842.8
[142] test-rmse:81794.5
[143] test-rmse:81803.8
[144] test-rmse:81829.3
[145] test-rmse:81815.9
[146] test-rmse:81813.6
[147] test-rmse:81741
[148] test-rmse:81728.8
[149] test-rmse:81714.4
[150] test-rmse:81708.6
[151] test-rmse:81592.3
[152] test-rmse:81621.7
[153] test-rmse:81624.8
[154] test-rmse:81629.3
[155] test-rmse:81615.7
[156] test-rmse:81617.7
[157] test-rmse:81613.9
[158] test-rmse:81612.9
[159] test-rmse:81594.9
[160] test-rmse:81595.1
[161] test-rmse:81581.7
[162] test-rmse:81595.3
[163] test-rmse:81603.8
[164] test-rmse:81601.2
[165] test-rmse:81600.5
[166] test-rmse:81552.3
[167] test-rmse:81557.6
[168] test-rmse:81565.5
[169] test-rmse:81566.6
[170] test-rmse:81581.9
[171] test-rmse:81570.5
[172] test-rmse:81571.8
[173] test-rmse:81569.4
[174] test-rmse:81494.3
[175] test-rmse:81476.3
[176] test-rmse:81454
[177] test-rmse:81422.6
[178] test-rmse:81426.1
[179] test-rmse:81410.8
[180] test-rmse:81425.1
[181] test-rmse:81418.2
[182] test-rmse:81419.4
[183] test-rmse:81409.6
[184] test-rmse:81392.1
[185] test-rmse:81389.3
[186] test-rmse:81391.1
[187] test-rmse:81414.5
[188] test-rmse:81369.9
[189] test-rmse:81368.3
[190] test-rmse:81358.4
[191] test-rmse:81347.7
[192] test-rmse:81355.4
[193] test-rmse:81349.2
[194] test-rmse:81343
[195] test-rmse:81346.3
[196] test-rmse:81345.5
[197] test-rmse:81374.6
[198] test-rmse:81358.5
[199] test-rmse:81359.4
Ich mache eine Rastersuche, um zu sehen, ob das Modell etwas genauer ist. Zunächst generieren wir die Parameter für die Rastersuche.
gridsearch_params = [
(max_depth, eta)
for max_depth in [6, 7, 8]
for eta in [0.1, 0.05, 0.01]
]
gridsearch_params
[(6, 0.1), (6, 0.05), (6, 0.01), (7, 0.1), (7, 0.05), (7, 0.01), (8, 0.1), (8, 0.05), (8, 0.01)]
Hier berechnen wir, welche Kombination von Parametern am genauesten ist. Das Ergebnis sind die besten Parameter (8, 0,01).
min_rmse = float('Inf')
best_param = []
for max_depth, eta in gridsearch_params:
print('max_depth = {}, eta = {}'.format(max_depth, eta))
params['max_depth'] = max_depth
params['eta'] = eta
cv_results = xgb.cv(
params,
dtrain,
num_boost_round = 1000,
seed = 0,
nfold = 5,
metrics = {'rmse'},
early_stopping_rounds = 5
)
mean_rmse = cv_results['test-rmse-mean'].min()
boost_rounds = cv_results['test-rmse-mean'].argmin()
print('RMSE {} for {} rounds'.format(mean_rmse, boost_rounds))
if mean_rmse < min_rmse:
min_rmse = mean_rmse
best_param = (max_depth, eta)
print('Best params {}, RMSE {}'.format(best_param, min_rmse))
max_depth = 6, eta = 0.1
RMSE 81689.0296874 for 123 rounds
max_depth = 6, eta = 0.05
RMSE 81545.2953126 for 267 rounds
max_depth = 6, eta = 0.01
RMSE 82118.7765624 for 999 rounds
max_depth = 7, eta = 0.1
RMSE 81372.990625 for 161 rounds
max_depth = 7, eta = 0.05
RMSE 81372.7171876 for 202 rounds
max_depth = 7, eta = 0.01
RMSE 81308.89999979999 for 999 rounds
max_depth = 8, eta = 0.1
RMSE 81277.4515624 for 96 rounds
max_depth = 8, eta = 0.05
RMSE 81155.2687498 for 201 rounds
max_depth = 8, eta = 0.01
RMSE 81080.3156252 for 849 rounds
Best params (8, 0.01), RMSE 81080.3156252
Generieren Sie das Trainingsmodell erneut mit den berechneten Parametern der Rastersuche.
params['max_depth'] = 8
params['eta'] = 0.01
model = xgb.train(params = params,
dtrain = dtrain,
num_boost_round = 1000,
early_stopping_rounds = 5,
evals = [(dtest, 'test')])
[0] test-rmse:515961
Will train until test-rmse hasn't improved in 5 rounds.
[1] test-rmse:511040
[2] test-rmse:506173
[3] test-rmse:501356
[4] test-rmse:496588
[5] test-rmse:491874
[6] test-rmse:487210
[7] test-rmse:482584
[8] test-rmse:478012
[9] test-rmse:473483
[10] test-rmse:468999
[11] test-rmse:464566
[12] test-rmse:460175
[13] test-rmse:455822
[14] test-rmse:451526
[15] test-rmse:447271
[16] test-rmse:443066
[17] test-rmse:438893
[18] test-rmse:434766
[19] test-rmse:430688
[20] test-rmse:426657
[21] test-rmse:422664
[22] test-rmse:418718
[23] test-rmse:414801
[24] test-rmse:410941
[25] test-rmse:407098
[26] test-rmse:403307
[27] test-rmse:399547
[28] test-rmse:395829
[29] test-rmse:392152
[30] test-rmse:388512
[31] test-rmse:384923
[32] test-rmse:381353
[33] test-rmse:377832
[34] test-rmse:374343
[35] test-rmse:370902
[36] test-rmse:367481
[37] test-rmse:364100
[38] test-rmse:360764
[39] test-rmse:357461
[40] test-rmse:354187
[41] test-rmse:350960
[42] test-rmse:347750
[43] test-rmse:344572
[44] test-rmse:341426
[45] test-rmse:338314
[46] test-rmse:335245
[47] test-rmse:332213
[48] test-rmse:329190
[49] test-rmse:326207
[50] test-rmse:323269
[51] test-rmse:320358
[52] test-rmse:317476
[53] test-rmse:314619
[54] test-rmse:311801
[55] test-rmse:309001
[56] test-rmse:306233
[57] test-rmse:303498
[58] test-rmse:300790
[59] test-rmse:298113
[60] test-rmse:295470
[61] test-rmse:292855
[62] test-rmse:290269
[63] test-rmse:287704
[64] test-rmse:285165
[65] test-rmse:282666
[66] test-rmse:280184
[67] test-rmse:277741
[68] test-rmse:275313
[69] test-rmse:272907
[70] test-rmse:270532
[71] test-rmse:268190
[72] test-rmse:265866
[73] test-rmse:263562
[74] test-rmse:261295
[75] test-rmse:259046
[76] test-rmse:256816
[77] test-rmse:254625
[78] test-rmse:252448
[79] test-rmse:250296
[80] test-rmse:248164
[81] test-rmse:246063
[82] test-rmse:243969
[83] test-rmse:241918
[84] test-rmse:239880
[85] test-rmse:237853
[86] test-rmse:235862
[87] test-rmse:233881
[88] test-rmse:231939
[89] test-rmse:230004
[90] test-rmse:228095
[91] test-rmse:226206
[92] test-rmse:224346
[93] test-rmse:222499
[94] test-rmse:220670
[95] test-rmse:218861
[96] test-rmse:217075
[97] test-rmse:215311
[98] test-rmse:213567
[99] test-rmse:211827
[100] test-rmse:210120
[101] test-rmse:208421
[102] test-rmse:206737
[103] test-rmse:205088
[104] test-rmse:203457
[105] test-rmse:201827
[106] test-rmse:200212
[107] test-rmse:198636
[108] test-rmse:197085
[109] test-rmse:195530
[110] test-rmse:194010
[111] test-rmse:192494
[112] test-rmse:191011
[113] test-rmse:189524
[114] test-rmse:188077
[115] test-rmse:186631
[116] test-rmse:185212
[117] test-rmse:183809
[118] test-rmse:182411
[119] test-rmse:181043
[120] test-rmse:179675
[121] test-rmse:178325
[122] test-rmse:177006
[123] test-rmse:175698
[124] test-rmse:174401
[125] test-rmse:173124
[126] test-rmse:171857
[127] test-rmse:170612
[128] test-rmse:169374
[129] test-rmse:168161
[130] test-rmse:166952
[131] test-rmse:165766
[132] test-rmse:164596
[133] test-rmse:163434
[134] test-rmse:162284
[135] test-rmse:161156
[136] test-rmse:160034
[137] test-rmse:158914
[138] test-rmse:157821
[139] test-rmse:156745
[140] test-rmse:155676
[141] test-rmse:154628
[142] test-rmse:153593
[143] test-rmse:152568
[144] test-rmse:151559
[145] test-rmse:150558
[146] test-rmse:149572
[147] test-rmse:148603
[148] test-rmse:147644
[149] test-rmse:146701
[150] test-rmse:145766
[151] test-rmse:144831
[152] test-rmse:143911
[153] test-rmse:143000
[154] test-rmse:142102
[155] test-rmse:141215
[156] test-rmse:140345
[157] test-rmse:139482
[158] test-rmse:138627
[159] test-rmse:137799
[160] test-rmse:136970
[161] test-rmse:136155
[162] test-rmse:135347
[163] test-rmse:134549
[164] test-rmse:133771
[165] test-rmse:132997
[166] test-rmse:132246
[167] test-rmse:131489
[168] test-rmse:130746
[169] test-rmse:130024
[170] test-rmse:129296
[171] test-rmse:128587
[172] test-rmse:127886
[173] test-rmse:127192
[174] test-rmse:126505
[175] test-rmse:125824
[176] test-rmse:125160
[177] test-rmse:124501
[178] test-rmse:123857
[179] test-rmse:123216
[180] test-rmse:122583
[181] test-rmse:121954
[182] test-rmse:121339
[183] test-rmse:120737
[184] test-rmse:120148
[185] test-rmse:119561
[186] test-rmse:118982
[187] test-rmse:118408
[188] test-rmse:117840
[189] test-rmse:117286
[190] test-rmse:116739
[191] test-rmse:116198
[192] test-rmse:115670
[193] test-rmse:115143
[194] test-rmse:114633
[195] test-rmse:114128
[196] test-rmse:113628
[197] test-rmse:113133
[198] test-rmse:112648
[199] test-rmse:112167
[200] test-rmse:111694
[201] test-rmse:111232
[202] test-rmse:110769
[203] test-rmse:110309
[204] test-rmse:109870
[205] test-rmse:109429
[206] test-rmse:109001
[207] test-rmse:108584
[208] test-rmse:108159
[209] test-rmse:107745
[210] test-rmse:107338
[211] test-rmse:106934
[212] test-rmse:106543
[213] test-rmse:106161
[214] test-rmse:105774
[215] test-rmse:105404
[216] test-rmse:105032
[217] test-rmse:104666
[218] test-rmse:104306
[219] test-rmse:103951
[220] test-rmse:103605
[221] test-rmse:103256
[222] test-rmse:102918
[223] test-rmse:102581
[224] test-rmse:102258
[225] test-rmse:101929
[226] test-rmse:101614
[227] test-rmse:101305
[228] test-rmse:101001
[229] test-rmse:100687
[230] test-rmse:100393
[231] test-rmse:100106
[232] test-rmse:99803.7
[233] test-rmse:99521.5
[234] test-rmse:99228
[235] test-rmse:98952.9
[236] test-rmse:98687
[237] test-rmse:98407.6
[238] test-rmse:98145.9
[239] test-rmse:97895.6
[240] test-rmse:97630.3
[241] test-rmse:97373.5
[242] test-rmse:97131.5
[243] test-rmse:96879.7
[244] test-rmse:96638.5
[245] test-rmse:96409.2
[246] test-rmse:96174.5
[247] test-rmse:95950.8
[248] test-rmse:95724
[249] test-rmse:95504.3
[250] test-rmse:95286
[251] test-rmse:95063.2
[252] test-rmse:94852.8
[253] test-rmse:94646.3
[254] test-rmse:94438.7
[255] test-rmse:94227.9
[256] test-rmse:94032
[257] test-rmse:93828.1
[258] test-rmse:93637
[259] test-rmse:93447.4
[260] test-rmse:93264
[261] test-rmse:93072.1
[262] test-rmse:92886.1
[263] test-rmse:92699.4
[264] test-rmse:92519.7
[265] test-rmse:92341.1
[266] test-rmse:92158.9
[267] test-rmse:91984.2
[268] test-rmse:91818.9
[269] test-rmse:91667.4
[270] test-rmse:91508.6
[271] test-rmse:91340.9
[272] test-rmse:91179.8
[273] test-rmse:91036.4
[274] test-rmse:90880.2
[275] test-rmse:90730.6
[276] test-rmse:90586.1
[277] test-rmse:90440.3
[278] test-rmse:90301.1
[279] test-rmse:90168.4
[280] test-rmse:90031.9
[281] test-rmse:89908.5
[282] test-rmse:89775.1
[283] test-rmse:89654.3
[284] test-rmse:89526.7
[285] test-rmse:89395.2
[286] test-rmse:89275.8
[287] test-rmse:89160.1
[288] test-rmse:89035.6
[289] test-rmse:88924.4
[290] test-rmse:88812.4
[291] test-rmse:88696.1
[292] test-rmse:88588.3
[293] test-rmse:88483.3
[294] test-rmse:88367.4
[295] test-rmse:88265.7
[296] test-rmse:88159
[297] test-rmse:88060.3
[298] test-rmse:87956.8
[299] test-rmse:87859.5
[300] test-rmse:87763.9
[301] test-rmse:87660.7
[302] test-rmse:87573.6
[303] test-rmse:87475.7
[304] test-rmse:87378.3
[305] test-rmse:87287.8
[306] test-rmse:87194.3
[307] test-rmse:87113.9
[308] test-rmse:87024.7
[309] test-rmse:86936.5
[310] test-rmse:86847.3
[311] test-rmse:86761.9
[312] test-rmse:86679.8
[313] test-rmse:86612.5
[314] test-rmse:86528
[315] test-rmse:86449.6
[316] test-rmse:86374.7
[317] test-rmse:86297.3
[318] test-rmse:86216.9
[319] test-rmse:86147
[320] test-rmse:86085.8
[321] test-rmse:86018.1
[322] test-rmse:85941.5
[323] test-rmse:85878.8
[324] test-rmse:85815.2
[325] test-rmse:85755.3
[326] test-rmse:85691.4
[327] test-rmse:85631.7
[328] test-rmse:85554.2
[329] test-rmse:85478.2
[330] test-rmse:85420.4
[331] test-rmse:85355
[332] test-rmse:85282.9
[333] test-rmse:85212.7
[334] test-rmse:85156.1
[335] test-rmse:85089.2
[336] test-rmse:85042.1
[337] test-rmse:84977.9
[338] test-rmse:84916.3
[339] test-rmse:84865
[340] test-rmse:84819.4
[341] test-rmse:84764.9
[342] test-rmse:84698.7
[343] test-rmse:84655.8
[344] test-rmse:84595.1
[345] test-rmse:84546.1
[346] test-rmse:84496.5
[347] test-rmse:84446.9
[348] test-rmse:84401.1
[349] test-rmse:84349.7
[350] test-rmse:84312.6
[351] test-rmse:84263.9
[352] test-rmse:84217.4
[353] test-rmse:84176.9
[354] test-rmse:84126.8
[355] test-rmse:84081.5
[356] test-rmse:84037.6
[357] test-rmse:84001.1
[358] test-rmse:83961.8
[359] test-rmse:83922.8
[360] test-rmse:83884.8
[361] test-rmse:83842.4
[362] test-rmse:83805.7
[363] test-rmse:83771.6
[364] test-rmse:83738.9
[365] test-rmse:83701.5
[366] test-rmse:83668
[367] test-rmse:83633.7
[368] test-rmse:83591.7
[369] test-rmse:83552.1
[370] test-rmse:83514.7
[371] test-rmse:83479.3
[372] test-rmse:83440.2
[373] test-rmse:83412.3
[374] test-rmse:83380.3
[375] test-rmse:83346.3
[376] test-rmse:83309.6
[377] test-rmse:83272.6
[378] test-rmse:83243.7
[379] test-rmse:83211.3
[380] test-rmse:83184.4
[381] test-rmse:83151.7
[382] test-rmse:83119.6
[383] test-rmse:83089.4
[384] test-rmse:83056.3
[385] test-rmse:83023.5
[386] test-rmse:82994.4
[387] test-rmse:82964.4
[388] test-rmse:82936.3
[389] test-rmse:82907.3
[390] test-rmse:82873.7
[391] test-rmse:82845.4
[392] test-rmse:82816.8
[393] test-rmse:82790.7
[394] test-rmse:82766
[395] test-rmse:82740.9
[396] test-rmse:82719.9
[397] test-rmse:82695.1
[398] test-rmse:82672.3
[399] test-rmse:82647.9
[400] test-rmse:82629.9
[401] test-rmse:82602
[402] test-rmse:82581.2
[403] test-rmse:82562.3
[404] test-rmse:82541
[405] test-rmse:82524.3
[406] test-rmse:82504
[407] test-rmse:82490.7
[408] test-rmse:82472
[409] test-rmse:82448
[410] test-rmse:82424.8
[411] test-rmse:82408.9
[412] test-rmse:82395.4
[413] test-rmse:82373.6
[414] test-rmse:82358.9
[415] test-rmse:82336.1
[416] test-rmse:82322.6
[417] test-rmse:82301.7
[418] test-rmse:82282.6
[419] test-rmse:82268.4
[420] test-rmse:82253.9
[421] test-rmse:82229.1
[422] test-rmse:82207
[423] test-rmse:82188.9
[424] test-rmse:82176.5
[425] test-rmse:82170.7
[426] test-rmse:82157
[427] test-rmse:82151.2
[428] test-rmse:82139.1
[429] test-rmse:82126.4
[430] test-rmse:82108.7
[431] test-rmse:82098.1
[432] test-rmse:82087.3
[433] test-rmse:82075.5
[434] test-rmse:82063.7
[435] test-rmse:82054
[436] test-rmse:82039.1
[437] test-rmse:82027.3
[438] test-rmse:82014.6
[439] test-rmse:82005.3
[440] test-rmse:81993.7
[441] test-rmse:81984.7
[442] test-rmse:81973.3
[443] test-rmse:81955.5
[444] test-rmse:81943.4
[445] test-rmse:81932.8
[446] test-rmse:81918.6
[447] test-rmse:81909.1
[448] test-rmse:81899.2
[449] test-rmse:81886.5
[450] test-rmse:81873.7
[451] test-rmse:81863
[452] test-rmse:81854.2
[453] test-rmse:81842.5
[454] test-rmse:81831.3
[455] test-rmse:81821.2
[456] test-rmse:81811.4
[457] test-rmse:81804.7
[458] test-rmse:81789.8
[459] test-rmse:81784.3
[460] test-rmse:81779.4
[461] test-rmse:81771.3
[462] test-rmse:81756.4
[463] test-rmse:81751.9
[464] test-rmse:81739.6
[465] test-rmse:81730.1
[466] test-rmse:81719.8
[467] test-rmse:81710.2
[468] test-rmse:81701.1
[469] test-rmse:81689.9
[470] test-rmse:81685
[471] test-rmse:81675.6
[472] test-rmse:81670.9
[473] test-rmse:81659.8
[474] test-rmse:81651.6
[475] test-rmse:81641.8
[476] test-rmse:81632.5
[477] test-rmse:81629.2
[478] test-rmse:81619.2
[479] test-rmse:81611
[480] test-rmse:81608
[481] test-rmse:81599.1
[482] test-rmse:81588.6
[483] test-rmse:81578.9
[484] test-rmse:81573.4
[485] test-rmse:81570.2
[486] test-rmse:81558.9
[487] test-rmse:81554.5
[488] test-rmse:81544.7
[489] test-rmse:81533.8
[490] test-rmse:81526.6
[491] test-rmse:81518.9
[492] test-rmse:81512.2
[493] test-rmse:81498.3
[494] test-rmse:81495.6
[495] test-rmse:81488.1
[496] test-rmse:81478.6
[497] test-rmse:81469.1
[498] test-rmse:81463
[499] test-rmse:81462.4
[500] test-rmse:81454.6
[501] test-rmse:81453.7
[502] test-rmse:81450.8
[503] test-rmse:81443
[504] test-rmse:81434
[505] test-rmse:81430.1
[506] test-rmse:81427.5
[507] test-rmse:81421.4
[508] test-rmse:81421.2
[509] test-rmse:81415.5
[510] test-rmse:81413
[511] test-rmse:81409.5
[512] test-rmse:81396.6
[513] test-rmse:81395.1
[514] test-rmse:81395.3
[515] test-rmse:81392.2
[516] test-rmse:81391.4
[517] test-rmse:81388.2
[518] test-rmse:81383.8
[519] test-rmse:81379.3
[520] test-rmse:81379.8
[521] test-rmse:81376.9
[522] test-rmse:81376.9
[523] test-rmse:81375.1
[524] test-rmse:81370.2
[525] test-rmse:81365.3
[526] test-rmse:81364.3
[527] test-rmse:81363.6
[528] test-rmse:81362.5
[529] test-rmse:81358.9
[530] test-rmse:81354.8
[531] test-rmse:81353.9
[532] test-rmse:81355.2
[533] test-rmse:81355.2
[534] test-rmse:81356.4
[535] test-rmse:81356.3
[536] test-rmse:81352.6
[537] test-rmse:81347.5
[538] test-rmse:81347.6
[539] test-rmse:81349.5
[540] test-rmse:81350.3
[541] test-rmse:81351.2
[542] test-rmse:81351.6
Stopping. Best iteration:
[537] test-rmse:81347.5
Lassen Sie uns zeigen, ob die Merkmalsmenge das wichtigste der zuvor erlernten Modelle war. Wenn wir uns das ansehen, können wir sehen, dass Funktionen wie long, lat und sqft_living von hoher Bedeutung sind.
fig, ax = plt.subplots(figsize = (12,12))
xgb.plot_importance(model, max_num_features = 12, height = 0.8, ax = ax)
plt.savefig('house_sails_feature_importance03.png', bbox_inches='tight')
Lassen Sie uns die Genauigkeit des trainierten Modells messen. In r2_score beträgt die Genauigkeit ungefähr 0,847.
from sklearn.metrics import r2_score
preds = model.predict(dtest)
r2 = r2_score(y_test, preds)
print(r2)
0.8473346069012444
Verwenden Sie abschließend joblib, um das Trainingsmodell als pkl-Datei zu speichern. Sie haben jetzt ein Lernmodell, das durch maschinelles Lernen gelernt wurde.
from sklearn.externals import joblib
joblib.dump(model, 'house_sales_model.pkl')
['house_sales_model.pkl']
Hier versuchen wir, das durch maschinelles Lernen erzeugte Lernmodell als API-Server zu verwenden. Flask, ein Python-Microservice-Framework, wird für die API-Serverentwicklung verwendet. Der Entwicklungsfluss besteht darin, eine virtuelle Umgebung mit conda zu erstellen, einen einfachen API-Server zu testen und das mit XGBoost erstellte Lernmodell darauf zu setzen.
Die virtuelle Umgebung verwendet Anacondas Conda. Erstellen Sie im Terminal einen Ordner für die App-Entwicklung (in diesem Fall titanic_api) und verschieben Sie ihn in diesen Ordner. Dann erstellt conda create die virtuelle Umgebung und conda enable aktiviert die virtuelle Umgebung.
mkdir housesails_api
cd housesails_api
conda create -n housesailsenv
conda activate housesailsenv
Um einen API-Server mit Flask zu entwickeln, erstellen und testen Sie zunächst einen einfachen API-Server. Erstellen Sie die folgenden Ordner und Dateien in dem zuvor erstellten Ordner. Wenn Sie den folgenden Code in jede Datei schreiben, den API-Server starten und über Curl kommunizieren können, ist der einfache API-Servertest erfolgreich.
Erstellen Sie Ordner und Dateien so, dass sie die folgende Hierarchie haben. Wenn Sie eine leere Datei erstellen möchten, können Sie bequem den Befehl touch verwenden.
housesails_api
├── api
│ ├── __init__.py
│ └── views
│ └── user.py
├── housesails_app.py
└── house_sales_model.pkl
Schreiben Sie den folgenden Code in die zuvor erstellte Datei. Zum Testen eines einfachen API-Servers sind drei Dateien erforderlich: api / views / user.py, api / init.py und titanic_app.py. Es ist praktisch, beim Schreiben im Terminal vim und beim Schreiben in der GUI Atom zu verwenden.
api/views/user.py
from flask import Blueprint, request, make_response, jsonify
#Routing-Einstellungen
user_router = Blueprint('user_router', __name__)
#Geben Sie den Pfad und die HTTP-Methode an
@user_router.route('/users', methods=['GET'])
def get_user_list():
return make_response(jsonify({
'users': [
{
'id': 1,
'name': 'John'
}
]
}))
api/__init__.py
from flask import Flask, make_response, jsonify
from .views.user import user_router
def create_app():
app = Flask(__name__)
app.register_blueprint(user_router, url_prefix='/api')
return app
app = create_app()
housesails_app.py
import json
from flask import Flask
from flask import request
from flask import abort
import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb
model = joblib.load("house_sales_model.pkl")
app = Flask(__name__)
# Get headers for payload
headers = ['sqft_living','sqft_above','sqft_basement','lat','long','sqft_living15','grade_3','grade_4','grade_5','grade_6','grade_7','grade_8','grade_9','grade_10','grade_11','grade_12']
@app.route('/house_sails', methods=['POST'])
def housesails():
if not request.json:
abort(400)
payload = request.json['data']
values = [float(i) for i in payload.split(',')]
data1 = pd.DataFrame([values], columns=headers, dtype=float)
predict = model.predict(xgb.DMatrix(data1))
return json.dumps(str(predict[0]))
if __name__ == "__main__":
app.run(debug=True, port=5000)
Starten Sie den API-Server nach dem Umschreiben des Codes erneut mit python hostsails_app.py. Nach dem Start des API-Servers wird der Kommunikationstest mit dem Befehl curl wie unten gezeigt durchgeführt. Es ist erfolgreich, wenn für die gesendeten JSON-Daten ein Wert mit einem Dezimalpunkt von 1 oder weniger zurückgegeben wird. Sie haben das durch maschinelles Lernen generierte Lernmodell nun in einen API-Server umgewandelt.
curl http://localhost:5000/house_sails -s -X POST -H "Content-Type: application/json" -d '{"data": "-1.026685, -0.725963, -0.652987, -0.323607, -0.307144, -0.946801, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0"}'
Fügen Sie abschließend die Authentifizierungsfunktion zum API-Server hinzu. Dieses Mal implementieren wir die Basisauthentifizierung. HTTP BasicAuth muss in der Bibliothek installiert sein. Sie können es implementieren, indem Sie dem Code von houseessails_app.py zuvor # BasicAuth hinzufügen.
housesails_app.py
import json
from flask import Flask
from flask import request
from flask import abort
from flask_httpauth import HTTPBasicAuth
import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb
model = joblib.load("house_sales_model.pkl")
app = Flask(__name__)
# BasicAuth
auth = HTTPBasicAuth()
users = {
"user01": "password01",
"user02": "password02"
}
@auth.get_password
def get_pw(username):
if username in users:
return users.get(username)
return None
# Get headers for payload
headers = ['sqft_living','sqft_above','sqft_basement','lat','long','sqft_living15','grade_3','grade_4','grade_5','grade_6','grade_7','grade_8','grade_9','grade_10','grade_11','grade_12']
@app.route('/house_sails', methods=['POST'])
# BasicAuth
@auth.login_required
def housesails():
if not request.json:
abort(400)
payload = request.json['data']
values = [float(i) for i in payload.split(',')]
data1 = pd.DataFrame([values], columns=headers, dtype=float)
predict = model.predict(xgb.DMatrix(data1))
return json.dumps(str(predict[0]))
if __name__ == "__main__":
app.run(debug=True, port=5000)
Starten Sie den API-Server erneut mit python hostsails_app.py. Nach dem Start des API-Servers wird der Kommunikationstest mit dem Befehl curl wie unten gezeigt durchgeführt. Sie können authentifiziert werden, indem Sie --user user01: password01 hinzufügen. Wenn die Kommunikation damit erfolgreich ist, ist sie erfolgreich.
curl http://localhost:5000/house_sails --user user01:password01 -s -X POST -H "Content-Type: application/json" -d '{"data": "-1.026685, -0.725963, -0.652987, -0.323607, -0.307144, -0.946801, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0"}'
Recommended Posts