[In-Database Python Analysis Tutorial with SQL Server 2017] Step 6: Using the model

From the beginning of the tutorial

In-Database Python Analysis for SQL Developers

Previous step

Step 5: Training and saving a model using T-SQL

Step 6: Use the model

In this step you will learn how to use the model trained in the previous step. "Use" here means "deploying the model to production for scoring." This is easy to expand because the Python code is contained in the stored procedure. To make new observation predictions from your application, just call the stored procedure.

There are two ways to call a Python model from a stored procedure:

-** Batch scoring mode : Use a SELECT query to serve multiple rows of data. The stored procedure returns a table of observations corresponding to the input - Individual scoring mode **: Pass a set of individual parameter values as input. The stored procedure returns a single record or value.

Scoring using the scikit-learn model

The stored procedure PredictTipSciKitPy uses the scikit-learn model.

The stored procedure PredictTipSciKitPy is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

  1. In Management Studio's Object Explorer, expand Programming> Stored Procedures.

  2. Right-click on PredictTipSciKitPy and select Modify to open the Transact-SQL script in a new query window.

    DROP PROCEDURE IF EXISTS PredictTipSciKitPy;
    GO
    
    CREATE PROCEDURE [dbo].[PredictTipSciKitPy](@model varchar(50), @inquery nvarchar(max))
    AS
    BEGIN
      DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
    
      EXEC sp_execute_external_script 
    	@language = N'Python',
        @script = N'
    import pickle
    import numpy
    # import pandas
    from sklearn import metrics
    
    mod = pickle.loads(lmodel2)
    
    X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
    y = numpy.ravel(InputDataSet[["tipped"]])
    
    prob_array = mod.predict_proba(X)
    prob_list = [item[1] for item in prob_array]
    
    prob_array = numpy.asarray(prob_list)
    fpr, tpr, thresholds = metrics.roc_curve(y, prob_array)
    auc_result = metrics.auc(fpr, tpr)
    print("AUC on testing data is:", auc_result)
    
    OutputDataSet = pandas.DataFrame(data=prob_list, columns=["predictions"])
    ',	
    	@input_data_1 = @inquery,
    	@input_data_1_name = N'InputDataSet',
    	@params = N'@lmodel2 varbinary(max)',
    	@lmodel2 = @lmodel2
      WITH RESULT SETS ((Score float));
    
    END
    GO
    

--Specify the model name to use in the input parameters of the stored procedure. --Fetch the serialized model from the nyc_taxi_models table based on the specified model name. --The serialized model is stored in the Python variable mod. --The new scored case is taken from the Transact-SQL query specified by @ input_data_1. The result of this query is saved in the default data frame ʻInputDat a Set. --This data frame is passed to the function predict_proba of the logistic regression model modcreated using the scikit-learn model. --The functionpredict_proba returns a float value that indicates the probability of receiving an arbitrary amount of chips. --In addition, calculate the precision metric ʻAUC (area under curve). Precision metrics such as AUC are only generated if you specify the objective variable (ie the tapped column) in addition to the explanatory variables. The prediction does not require the objective variable (variable Y), but it is required to calculate the accuracy metric. Therefore, if the data to be scored does not have an objective variable, remove the AUC calculation block from the source code and modify the stored procedure to simply return the probability of receiving a chip by the explanatory variable (variable X).

Scoring using the revoscalepy model

The stored procedure PredictTipRxPy uses a model created using the revoscalepy library. This works much like the procedure PredictTipSciKitPy, but with some changes to the revoscalepy function.

The stored procedure PredictTipRxPy is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

  1. In Management Studio's Object Explorer, expand Programming> Stored Procedures.

  2. Right-click on PredictTipRxPy and select Modify to open the Transact-SQL script in a new query window.

    DROP PROCEDURE IF EXISTS PredictTipRxPy;
    GO
    
    CREATE PROCEDURE [dbo].[PredictTipRxPy](@model varchar(50), @inquery nvarchar(max))
    AS
    BEGIN
      DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
    
      EXEC sp_execute_external_script 
    	@language = N'Python',
        @script = N'
    import pickle
    import numpy
    # import pandas
    from sklearn import metrics
    from revoscalepy.functions.RxPredict import rx_predict
    
    mod = pickle.loads(lmodel2)
    X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
    y = numpy.ravel(InputDataSet[["tipped"]])
    
    prob_array = rx_predict(mod, X)
    prob_list = prob_array["tipped_Pred"].values
    
    prob_array = numpy.asarray(prob_list)
    fpr, tpr, thresholds = metrics.roc_curve(y, prob_array)
    auc_result = metrics.auc(fpr, tpr)
    print("AUC on testing data is:", auc_result)
    OutputDataSet = pandas.DataFrame(data=prob_list, columns=["predictions"])
    ',	
    	@input_data_1 = @inquery,
    	@input_data_1_name = N'InputDataSet',
    	@params = N'@lmodel2 varbinary(max)',
    	@lmodel2 = @lmodel2
      WITH RESULT SETS ((Score float));
    
    END
    GO
    

Perform batch scoring

The stored procedures PredictTipSciKitPy and PredictTipRxPy require two input parameters.

--Query to extract data to be scored --Trained model identifier used for scoring

In this section, you'll learn how to pass these arguments to a stored procedure to easily modify both the model and the data used for scoring.

  1. Define the input data and call the stored procedure for scoring as follows: This example uses the stored procedure PredictTipSciKitPy for scoring and passes the model name and query string.

    DECLARE @query_string nvarchar(max) -- Specify input query
      SET @query_string='
      select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance,
      dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
      from nyctaxi_sample_testing'
    EXEC [dbo].[PredictTipSciKitPy] 'SciKit_model', @query_string;
    

    sqldev-python-step6-1-gho9o9.png

    sqldev-python-step6-2-gho9o9.png

The stored procedure returns an estimated probability of receiving a tip for each operation record passed as part of an input query. The predicted values are displayed in the results pane of Management Studio. The accuracy metric ʻAUC (area under the curve)` is output in the message pane.

  1. To use the revoscalepy model for scoring, call the stored procedure PredictTipRxPy.

    DECLARE @query_string nvarchar(max) -- Specify input query
      SET @query_string='
      select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance,
      dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
      from nyctaxi_sample_testing'
    EXEC [dbo].[PredictTipRxPy] 'revoscalepy_model', @query_string;
    

    sqldev-python-step6-3-gho9o9.png

    sqldev-python-step6-4-gho9o9.png

Performing individual scoring

In some cases, instead of batch scoring, you want to pass a single case and get a single result based on that value. For example, configure an Excel worksheet, web application, or Reporting Services report to call a stored procedure based on user input.

In this section, you will learn how to call the stored procedures PredictTipSingleModeSciKitPy and PredictTipSingleModeRxPy to create a single prediction.

The stored procedures PredictTipSingleModeSciKitPy and PredictTipSingleModeRxPy are defined in SQL Server through Step 2: Import data into SQL Server using PowerShell.

  1. In Management Studio's Object Explorer, expand Programming> Stored Procedures.

  2. Right-click PredictTipSingleModeSciKitPy or PredictTipSingleModeRxPy and select Modify to open the Transact-SQL script in a new query window.

These stored procedures use the scikit-learn and revoscalepy models to perform scoring as follows: --The model name and multiple single values are provided as input. These inputs include the number of passengers, driving distance, and so on. --The table-valued function fnEngineerFeatures takes latitude and longitude as input and converts them directly to distance. --When calling a stored procedure from an external application, make sure that the input data matches the required input functionality of the Python model. This includes casting the input data to a Python data type and validating the data type and data length. --The stored procedure creates a score based on the stored Python model.

Below is a stored procedure PredictTipSingleModeSciKitPy that performs scoring using the scikit-learn model.

```SQL:PredictTipSingleModeSciKitPy
CREATE PROCEDURE [dbo].[PredictTipSingleModeSciKitPy](@model varchar(50), @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
AS
BEGIN
    DECLARE @inquery nvarchar(max) = N'
    SELECT * FROM [dbo].[fnEngineerFeatures]( 
    @passenger_count,
    @trip_distance,
    @trip_time_in_secs,
    @pickup_latitude,
    @pickup_longitude,
    @dropoff_latitude,
    @dropoff_longitude)
    '
    DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
    EXEC sp_execute_external_script 
        @language = N'Python',
        @script = N'
import pickle
import numpy
# import pandas

# Load model and unserialize
mod = pickle.loads(model)

# Get features for scoring from input data
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]

# Score data to get tip prediction probability as a list (of float)
prob = [mod.predict_proba(X)[0][1]]

# Create output data frame
OutputDataSet = pandas.DataFrame(data=prob, columns=["predictions"])
',
    @input_data_1 = @inquery,
    @params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
        @trip_time_in_secs int ,
        @pickup_latitude float ,
        @pickup_longitude float ,
        @dropoff_latitude float ,
        @dropoff_longitude float',
    @model = @lmodel2,
        @passenger_count =@passenger_count ,
        @trip_distance=@trip_distance,
        @trip_time_in_secs=@trip_time_in_secs,
        @pickup_latitude=@pickup_latitude,
        @pickup_longitude=@pickup_longitude,
        @dropoff_latitude=@dropoff_latitude,
        @dropoff_longitude=@dropoff_longitude
    WITH RESULT SETS ((Score float));
END
GO
```

Below is a stored procedure PredictTipSingleModeRxPy that performs scoring using the revoscalepy model.

```SQL
CREATE PROCEDURE [dbo].[PredictTipSingleModeRxPy](@model varchar(50), @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
AS
BEGIN
    DECLARE @inquery nvarchar(max) = N'
    SELECT * FROM [dbo].[fnEngineerFeatures]( 
    @passenger_count,
    @trip_distance,
    @trip_time_in_secs,
    @pickup_latitude,
    @pickup_longitude,
    @dropoff_latitude,
    @dropoff_longitude)
    '
    DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
    EXEC sp_execute_external_script 
        @language = N'Python',
        @script = N'
import pickle
import numpy
# import pandas
from revoscalepy.functions.RxPredict import rx_predict

# Load model and unserialize
mod = pickle.loads(model)

# Get features for scoring from input data
x = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]

# Score data to get tip prediction probability as a list (of float)

prob_array = rx_predict(mod, x)

prob_list = prob_array["tipped_Pred"].values

# Create output data frame
OutputDataSet = pandas.DataFrame(data=prob_list, columns=["predictions"])
',
    @input_data_1 = @inquery,
    @params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
        @trip_time_in_secs int ,
        @pickup_latitude float ,
        @pickup_longitude float ,
        @dropoff_latitude float ,
        @dropoff_longitude float',
    @model = @lmodel2,
        @passenger_count =@passenger_count ,
        @trip_distance=@trip_distance,
        @trip_time_in_secs=@trip_time_in_secs,
        @pickup_latitude=@pickup_latitude,
        @pickup_longitude=@pickup_longitude,
        @dropoff_latitude=@dropoff_latitude,
        @dropoff_longitude=@dropoff_longitude
    WITH RESULT SETS ((Score float));
END
GO
```
  1. Open a new query window in Management Studio and enter the explanatory variable column to call the stored procedure.

    -- Call stored procedure PredictTipSingleModeSciKitPy to score using SciKit-Learn model
    EXEC [dbo].[PredictTipSingleModeSciKitPy] 'SciKit_model', 1, 2.5, 631, 40.763958,-73.973373, 40.782139,-73.977303
    -- Call stored procedure PredictTipSingleModeRxPy to score using revoscalepy model
    EXEC [dbo].[PredictTipSingleModeRxPy] 'revoscalepy_model', 1, 2.5, 631, 40.763958,-73.973373, 40.782139,-73.977303
    

The seven explanatory variable values are in the following order: - passenger_count - trip_distance - trip_time_in_secs - pickup_latitude - pickup_longitude - dropoff_latitude - dropoff_longitude

As a result, the probability that a tip will be paid in an operation with the above parameters is returned.

![sqldev-python-step6-5-gho9o9.png](https://qiita-image-store.s3.amazonaws.com/0/195839/27bc40a6-07ae-595d-fec1-4c030e6c04ab.png)

Summary

In this tutorial, you learned how to work with Python code embedded in stored procedures. We found that integration with Transact-SQL makes it even easier to deploy Python models for forecasting and incorporate model retraining as part of an enterprise data workflow.

Link

Previous step

Step 5: Training and saving a model using T-SQL

From the beginning of the tutorial

In-Database Python Analysis for SQL Developers

Source

Step 6: Operationalize the Model

Related item

Machine Learning Services with Python

Recommended Posts

[In-Database Python Analysis Tutorial with SQL Server 2017] Step 6: Using the model
[In-Database Python Analysis Tutorial with SQL Server 2017]
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 5: Training and saving a model using T-SQL
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 2: Import data to SQL Server using PowerShell
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 1: Download sample data
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 4: Feature extraction of data using T-SQL
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 3: Data Exploration and Visualization
Behind the flyer: Using Docker with Python
Recommendation tutorial using association analysis (python implementation)
Working with OpenStack using the Python SDK
Deploy and use the prediction model created in Python on SQL Server
Solving the Lorenz 96 model with Julia and Python
Execute raw SQL using python data source with redash and display the result
Data analysis with python 2
Explanation of the concept of regression analysis using python Part 2
Data analysis using Python 0
Voice analysis with python
Calculate the regression coefficient of simple regression analysis with python
Explanation of the concept of regression analysis using Python Part 1
Principal component analysis using python from nim with nimpy
Explanation of the concept of regression analysis using Python Extra 1
Voice analysis with python
Data analysis with Python
Local server with python
ODBC access to SQL Server from Linux with Python
I tried using the Python library from Ruby with PyCall
[Python] LASSO regression with equation constraints using the multiplier method
Enjoy the Gray-Scott model with short code using matrix math
[Python] The first step to making a game with Pyxel
Touch NoSQL with Python using the Oracle NoSQL Database Cloud Simulator
Morphological analysis using Igo + mecab-ipadic-neologd in Python (with Ruby bonus)
This and that for using Step Functions with CDK + Python
[Python] Morphological analysis with MeCab
[S3] CRUD with S3 using Python [Python]
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Using Quaternion with Python ~ numpy-quaternion ~
[Python] Using OpenCV with Python (Basic)
Sentiment analysis with Python (word2vec)
Planar skeleton analysis with Python
Japanese morphological analysis with Python
Easy HTTP server with Python
Calibrate the model with PyCaret
Call the API with python3.
Data analysis using python pandas
Muscle jerk analysis with Python
Using OpenCV with Python @Mac
Send using Python with Gmail
Control the motor with a motor driver using python on Raspberry Pi 3!
Using Python with SPSS Modeler extension node (2) Model creation using Spark MLlib
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
I made a poker game server chat-holdem using websocket with python
Complement python with emacs using company-jedi
Harmonic mean with Python Harmonic mean (using SciPy)
Extract the xz file with python
[Python] Using OpenCV with Python (Image Filtering)
3D skeleton structure analysis with Python
Using Rstan from Python with PypeR
[Python] Using OpenCV with Python (Image transformation)
Impedance analysis (EIS) with python [impedance.py]
[Python] Using OpenCV with Python (Edge Detection)
Web scraping with Python First step