Regarding unseen data

When studying PyCaret, it seems that unseen data is mistaken for test data, but unseen data is test data, but if you explain in detail,

Create a predictive model with training data Create a final prediction model by combining training data with test data Finally, enter unseen data into the model to check the accuracy of the model

It will be the flow.

Last review

Machine learning experience in just a few lines (first part). Explain PyCaret in detail. From dataset preparation to accuracy comparison of multiple models. is continued. Last time, we did everything from preparing the dataset to comparing the accuracy of the models.

Purpose of this time

In part2, we will create the model, plot it, and create the final model.

Create a model using training data

The purpose of compare_models () is not to create trained models, but to evaluate high performance models and select model candidates. This time, we will train the model using a random forest.

`code.py`


rf = create_model('rf')

tune_model () is a random grid search for hyperparameters. By default, it is set to optimize accuracy.

`code.py`


tuned_rf = tune_model('rf')

For example, in a random forest, if you want to create a model with a high AUC value, the code would look like this:

`code.py`


tuned_rf_auc = tune_model('rf', optimize = 'AUC')

The model created with tuned_model is 1.45% more accurate, so I will use it.

Plot the accuracy of the model

Run AUC Plot

`code.py`


plot_model(tuned_rf, plot = 'auc')

Precision-Recall Curve

`code.py`


plot_model(tuned_rf, plot = 'pr')

Feature Importance Plot

`code.py`


plot_model(tuned_rf, plot='feature')

`code.py`


evaluate_model(tuned_rf)

Confusion Matrix

`code.py`


plot_model(tuned_rf, plot = 'confusion_matrix')

To create a prediction model by combining training data and test data

Before finally completing the predictive model, use test data to check that the training model is not overfitted. Here, if the difference in accuracy becomes large, it is necessary to consider it, but this time there is no big difference in accuracy, so we will proceed.

`code.py`


predict_model(tuned_rf);

Finally, the final version of the prediction model is completed. The model here is a combination of training and test data.

`code.py`


final_rf = finalize_model(tuned_rf)
print(final_rf)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=10, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=2, min_samples_split=10, min_weight_fraction_leaf=0.0, n_estimators=70, n_jobs=None, oob_score=False, random_state=123, verbose=0, warm_start=False)

`code.py`


predict_model(final_rf);

The accuracy and AUC performance are high. This is because the test data was combined to improve the quality of the predictive model.

Model evaluation using unseen data

Finally, we will use unseen data (a dataset of 1200) to evaluate the predictive model.

`code.py`


unseen_predictions = predict_model(final_rf, data=data_unseen)
unseen_predictions.head()

Label and Score have been added to the dataset. Label will be the label predicted by the model. Score is the probability of prediction.

Save model

When you have more new data to predict, it's hard to start over. Save_model is prepared in PyCaret, and you can save the model.

`code.py`


save_model(final_rf,'Final RF Model')

Transformation Pipeline and Model Succesfully Saved

Loading the saved model

To load the model, do the following:

`code.py`


saved_final_rf = load_model('Final RF Model')

Transformation Pipeline and Model Sucessfully Loaded

Use the unseen data from earlier. The result is the same as before, so I will omit it.

`code.py`


new_prediction = predict_model(saved_final_rf, data=data_unseen)

`code.py`


new_prediction.head()

at the end

I tried to execute the explanation of the Level Beginner tutorial. I'm surprised that it can be done so far with a dozen lines. I feel that the hurdles for machine learning have become even lower.

If you have any suggestions, please comment. Thank you for reading.

[PYTHON] Machine learning experience in just a few lines (Part 2). Explain PyCaret in detail. Model building and evaluation analysis.

Regarding unseen data

Last review

Purpose of this time

Create a model using training data

code.py

code.py

code.py

Plot the accuracy of the model

code.py

code.py

code.py

code.py

code.py

To create a prediction model by combining training data and test data

code.py

code.py

code.py

Model evaluation using unseen data

code.py

Save model

code.py

Loading the saved model

code.py

code.py

code.py

at the end

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`

`code.py`