[PYTHON] Use MLflow with Databricks ④ --Call model -

Introduction

In the following article, I used Databricks' managed MLflow to train my model and manage my lifecycle.

Using MLflow with Databricks ① --Experiment tracking on notebook- Using MLflow with Databricks ② --Visualization of experimental parameters and metrics- Using MLflow with Databricks ③ --Model lifecycle management-

This time I would like to load the trained and staging model from another notebook. As an image, the trained model is loaded as a Pyspark user-defined function, and the pyspark data frame is distributed.

setup

For the model you want to call ["Run ID"](https://qiita.com/knt078/items/c40c449a512b79c7fd6e#%E3%83%A2%E3%83%87%E3%83%AB%E3%81% Read AE% E7% 99% BB% E9% 8C% B2).

python


# run_id = "<run-id>"
run_id = "d35dff588112486fa1684f38******"
model_uri = "runs:/" + run_id + "/model"

load scikit-learn model

Load the experimented training model using the MLflow API.

python


import mlflow.sklearn
model = mlflow.sklearn.load_model(model_uri=model_uri)
model.coef_

Next, read the diabetes dataset that was also used for training and drop the "progression" column. Then convert the loaded pandas data frame to a pyspark data frame.

python


# Import various libraries including sklearn, mlflow, numpy, pandas

from sklearn import datasets
import numpy as np
import pandas as pd

# Load Diabetes datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Create pandas DataFrame for sklearn ElasticNet linear_model
Y = np.array([y]).transpose()
d = np.concatenate((X, Y), axis=1)
cols = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'progression']
data = pd.DataFrame(d, columns=cols)
dataframe = spark.createDataFrame(data.drop(["progression"], axis=1))

Call the MLflow model

Call the trained model as a Pyspark user-defined function using the MLflow API.

python


import mlflow.pyfunc
pyfunc_udf = mlflow.pyfunc.spark_udf(spark, model_uri=model_uri)

Make predictions using user-defined functions.

python


predicted_df = dataframe.withColumn("prediction", pyfunc_udf('age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'))
display(predicted_df)

I was able to do distributed processing using the Pyspark model.

2020-11-05_10h45_19.png

in conclusion

This time I was able to call the trained model using the MLflow API and distribute it in Pyspark. Databricks is constantly being updated with new features to make it easier to use. I would like to continue to chase after new features.

Recommended Posts

Use MLflow with Databricks ④ --Call model -
Using MLflow with Databricks ③ --Model lifecycle management -
Using MLflow with Databricks ① --Experimental tracking on notebook -
Use Python and word2vec (learned) with Azure Databricks
Use mecab-ipadic-neologd with igo-python
Use RTX 3090 with PyTorch
Use ansible with cygwin
Use pipdeptree with virtualenv
[Python] Use JSON with Python
Use Mock with pytest
Use indicator with pd.merge
Use Gentelella with django
Use mecab with Python3
Use tensorboard with Chainer
Use DynamoDB with Python
Use pip with MSYS2
Model fitting with lmfit
Use Python 3.8 with Anaconda
Use pyright with Spacemacs
Use python with docker
Use TypeScript with django-compressor
Regression with linear model
Call bash with golang
Use LESS with Django
Use MySQL with Django
Use Enums with SQLAlchemy
Use tensorboard with NNabla
Use GPS with Edison
Use nim with Jupyter
Using MLflow with Databricks ② --Visualization of experimental parameters and metrics -
Create a python machine learning model relearning mechanism with mlflow
Use Trello API with python
Use shared memory with shared libraries
Use "$ in" operator with mongo-go-driver
Use custom tags with PyYAML
Use directional graphs with networkx
Use TensorFlow with Intellij IDEA
Use Twitter API with Python
Use pip with Jupyter Notebook
Use DATE_FORMAT with SQLAlchemy filter
Use TUN / TAP with Python
Use sqlite3 with NAO (Pepper)
Use sqlite load_extensions with Pyramid
Use Windows 10 fonts with WSL
Use chainer with Jetson TK1
Use SSL with Celery + Redis
Use Cython with Jupyter Notebook
Use Maxout + CNN with Pylearn2
Use django model from interpreter
Use WDC-433SU2M2 with Manjaro Linux
Calibrate the model with PyCaret
Use OpenBLAS with numpy, scipy
Call the API with python3.
Use subsonic API with python3
Use Sonicwall NetExtener with Systemd
Use prefetch_related conveniently with Django
I tried MLflow on Databricks
Use AWS interpreter with Pycharm
Use Bokeh with IPython Notebook
Use Python-like range with Rust