[PYTHON] I tried MLflow on Databricks

Operating environment

If you want to install additional external libraries

Evaluate the model with MLflow Tracking

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
import lightgbm as lgb

import mlflow
import mlflow.lightgbm
def train(learning_rate, colsample_bytree, subsample):

  #Data preparation
  iris = datasets.load_iris()
  X = iris.data
  y = iris.target
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  #Make it in lightgbm format
  train_set = lgb.Dataset(X_train, label=y_train)
  
  #Automatic tracking
  mlflow.lightgbm.autolog()
  
  with mlflow.start_run():

      #Learn the model
      params = {
          "objective": "multiclass",
          "num_class": 3,
          "learning_rate": learning_rate,
          "metric": "multi_logloss",
          "colsample_bytree": colsample_bytree,
          "subsample": subsample,
          "seed": 42,
      }
      model = lgb.train(
          params, train_set, num_boost_round=10, valid_sets=[train_set], valid_names=["train"]
      )

      #Evaluation of the model
      y_proba = model.predict(X_test)
      y_pred = y_proba.argmax(axis=1)
      loss = log_loss(y_test, y_proba)
      acc = accuracy_score(y_test, y_pred)

      # log metrics
      mlflow.log_metrics({"log_loss": loss, "accuracy": acc})
train(0.1, 1.0, 1.0)
train(0.2, 0.8, 0.9)
train(0.4, 0.7, 0.8)

Register the model in the Model Registry

Launch an inference API using Model Serving

Change the stage of Model

Enable Model Serving

Use API from client side

export DATABRICKS_TOKEN={token}

cat <<EOF > ./data.json
 [
   {
     "sepal length(cm)": 4.6,
     "sepal width(cm)": 3.6,
     "petal length(cm)": 1,
     "petal width(cm)": 0.2
   }
 ]
 EOF

curl \
  -u token:$DATABRICKS_TOKEN \
  -H "Content-Type: application/json; format=pandas-records" \
  [email protected] \
  https://dbc-xxxxxxxxxxxxx.cloud.databricks.com/model/iris_model/Production/invocations
[[0.9877602676352799, 0.006085719008512947, 0.006154013356207185]]

Finally

Recommended Posts

I tried MLflow on Databricks
I tried AdaNet on table data
I tried Cython on Ubuntu on VirtualBox
I tried scraping
I tried PyQ
I tried AutoKeras
I tried papermill
I tried django-slack
I tried Django
I tried spleeter
I tried cgo
I tried using Remote API on GAE / J
Using MLflow with Databricks ① --Experimental tracking on notebook -
I tried running YOLO v3 on Google Colab
I tried launching jupyter nteract on heroku server
[Pythonocc] I tried using CAD on jupyter notebook
I tried LINE Message API (line-bot-sdk-python) on GAE
I tried playing with the calculator on tkinter
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried competitive programming
I tried running pymc
I tried ARP spoofing
I tried using aiomysql
I tried using Summpy
I tried Python> autopep8
I tried using Pipenv
I tried using matplotlib
I tried using ESPCN
I tried PyCaret2.0 (pycaret-nightly)
I tried using openpyxl
I tried deep learning
I tried AWS CDK!
I tried using Ipython
I tried to debug.
I tried using PyCaret
I tried using cron
I tried Kivy's mapview
I tried using ngrok
I tried using face_recognition
I tried to paste
I tried using Jupyter
I tried using PyCaret
I tried moving EfficientDet
I tried shell programming
I tried using Heapq
I tried using doctest
I tried Python> decorator
I tried running TensorFlow
I tried using folium
I tried using jinja2
I tried AWS Iot
I tried Bayesian optimization!
I tried using folium
I tried using time-window
I tried Python on Mac for the first time.
I tried running the app on the IoT platform "Rimotte"
I tried to implement Minesweeper on terminal with python
I tried python on heroku for the first time
I tried a visual regression test on GitHub Pages