Immediately, I tried using the machine learning library PyCaret released the other day. I realized that anyone can easily model. It was really easy! You can tune and predict from pre-processing without writing 10 lines of code! There are many parts that I haven't figured out yet, such as arguments, but I decided to write the PyCaret article first. If you notice anything, please comment.
Execute the code below to install it. It was a sensation, but it took only a few minutes. When I installed it locally, I got an error, so I gave up once.
! pip install pycaret
This time we will use the data of boston. You can get the data with the following code.
from pycaret.datasets import get_data
boston_data = get_data('boston')
Perform preprocessing.
Data and target variables are defined and initialized in setup ().
Since we are solving a regression problem this time, we have specified pycaret.regression.
For classification problems, specify pycaret.classification.
You can also perform tasks such as natural language processing and clustering.
setup () handles missing values, encodes categorical data, train-test-split, etc. For more information, see here.
from pycaret.regression import *
exp1 = setup(boston_data, target = 'medv')
Run it to complete the setup.

Let's compare and select models. You can compare models in one line below. It took a few minutes. It is convenient to check the evaluation index in a list! By default, k-fold is divided into 10 parts. You can specify the number of folds and the index to sort with the argument. (Running is done by default.)
compare_models()
Click here for execution results

Select a model and model it. This time I'm using Random Forest. (I feel completely.) This function returns a table containing k-folded scores and trained model objects. You can also check the SD, which is very convenient!
rf = create_model('rf')

By specifying a period after the trained object, you can check as follows.

Tuning can also be done in one line.
tuned_rf = tune_model('rf')

You can get the parameters below.
tuned_rf.get_params

Let's visualize the accuracy of the model. The regression plot is shown below, but for classification problems, you can choose the output according to the metric. I regret that I should have selected the classification problem here because there are many variations of visualization of the classification problem. .. ..
plot_model(tuned_rf)

The model is interpreted using SHAP. Check SHAP git for how to read the graph and how to interpret the model.
interpret_model(tuned_rf)

The prediction for the test data is written as follows. The execution result returns the predicted result for 30% of the test data train-test-split by setup ().
rf_holdout_pred = predict_model(rf)

When making predictions for new data, pass the dataset as an argument to data.
predictions = predict_model(rf, data=boston_data)
The prediction result is added to the far right.

Until the end Thank you for reading. If you have any questions, please leave a comment.
Recommended Posts