[In-Database Python Analysis Tutorial with SQL Server 2017] Step 5: Training and saving a model using T-SQL

From the beginning of the tutorial

In-Database Python Analysis for SQL Developers

Previous step

Step 4: Feature extraction of data using T-SQL

Next step

Step 6: Manipulate the model

Step 5: Training and saving the model using T-SQL

In this step, you will learn how to train a machine learning model using the Python packages scikit-learn and revoscalepy. These Python packages are already installed with SQL Server Machine Learning Services, so you can load modules and call the required functions from within your stored procedure. Train your model with the data features you create and store the trained model in a SQL Server table.

Split sample data into training set and test set with stored procedure TrainTestSplit

The stored procedure TrainTestSplit is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

  1. In Management Studio's Object Explorer, expand Programming> Stored Procedures.
  2. Right-click TrainTestSplit and select Modify to open the Transact-SQL script in a new query window.

TrainTestSplit splits the data in the nyctaxi_sample table into two tables, nyctaxi_sample_training and nyctaxi_sample_testing.

```SQL:TrainTestSplit
CREATE PROCEDURE [dbo].[TrainTestSplit](@pct int)
AS

DROP TABLE IF EXISTS dbo.nyctaxi_sample_training
SELECT * into nyctaxi_sample_training FROM nyctaxi_sample WHERE (ABS(CAST(BINARY_CHECKSUM(medallion,hack_license)  as int)) % 100) < @pct

DROP TABLE IF EXISTS dbo.nyctaxi_sample_testing
SELECT * into nyctaxi_sample_testing FROM nyctaxi_sample
WHERE (ABS(CAST(BINARY_CHECKSUM(medallion,hack_license)  as int)) % 100) > @pct
GO
```
  1. Run the stored procedure and enter an integer that represents the percentage you want to assign to the training set. For example, the following statement assigns 60% of the data to a training set. Training and test data are stored in two separate tables.

    EXEC TrainTestSplit 60
    GO
    

    sqldev-python-step5-1-gho9o9.png

Build a logistic regression model using scikit-learn

In this section, you will use the training data you created to create a stored procedure that trains your model. This stored procedure uses the scikit-learn function to train a logistic regression model. This is implemented by using the system stored procedure sp_execute_external_script to call the Python runtime installed with SQL Server.

The model is facilitated by retraining the model by defining new training data as parameters and creating a stored procedure that wraps the call to the system stored procedure sp_execute_exernal_script.

The stored procedure TrainTipPredictionModelSciKitPy is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

  1. In Management Studio's Object Explorer, expand Programming> Stored Procedures.

  2. Right-click TrainTipPredictionModelSciKitPy and select Modify to open the Transact-SQL script in a new query window.

    DROP PROCEDURE IF EXISTS TrainTipPredictionModelSciKitPy;
    GO
    
    CREATE PROCEDURE [dbo].[TrainTipPredictionModelSciKitPy](@trained_model varbinary(max) OUTPUT)
    AS
    BEGIN
      EXEC sp_execute_external_script
      @language = N'Python',
      @script = N'
    import numpy
    import pickle
    # import pandas
    from sklearn.linear_model import LogisticRegression
    
    ##Create SciKit-Learn logistic regression model
    X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
    y = numpy.ravel(InputDataSet[["tipped"]])
    
    SKLalgo = LogisticRegression()
    logitObj = SKLalgo.fit(X, y)
    
    ##Serialize model
    trained_model = pickle.dumps(logitObj)
    ',
      @input_data_1 = N'
      select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance, 
      dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
      from nyctaxi_sample_training
      ',
      @input_data_1_name = N'InputDataSet',
      @params = N'@trained_model varbinary(max) OUTPUT',
      @trained_model = @trained_model OUTPUT;
      ;
    END;
    GO
    
  3. Execute the following SQL statement to register the trained model in the nyc_taxi_models table.

    DECLARE @model VARBINARY(MAX);
    EXEC TrainTipPredictionModelSciKitPy @model OUTPUT;
    INSERT INTO nyc_taxi_models (name, model) VALUES('SciKit_model', @model);
    

    sqldev-python-step5-2-gho9o9.png

  4. Make sure that one new record is added to the nyc_taxi_models table and the serialized model is registered.

    sqldev-python-step5-3-gho9o9.png

Build a logistic regression model using the revoscalepy package

Then train your logistic regression model with the stored procedure TrainTipPredictionModelRxPy using the new release RevoScalePy package. Python's RevoScalePy package contains algorithms for object definition, data processing, and machine learning similar to those provided by R's RevoScaleR package. This library allows you to train predictive models using common algorithms such as logistic, linear regression, and decision trees, create computational contexts, move data between computational contexts, and process data. For more information on RevoScalePy, see Introducing RevoScalePy.

The stored procedure TrainTipPredictionModelRxPy is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

  1. In Management Studio's Object Explorer, expand Programming> Stored Procedures.

  2. Right-click TrainTipPredictionModelRxPy and select Modify to open the Transact-SQL script in a new query window.

    DROP PROCEDURE IF EXISTS TrainTipPredictionModelRxPy;
    GO
    
    CREATE PROCEDURE [dbo].[TrainTipPredictionModelRxPy](@trained_model varbinary(max) OUTPUT)
    AS
    BEGIN
    EXEC sp_execute_external_script 
      @language = N'Python',
      @script = N'
    import numpy
    import pickle
    # import pandas
    from revoscalepy.functions.RxLogit import rx_logit
    
    ## Create a logistic regression model using rx_logit function from revoscalepy package
    logitObj = rx_logit("tipped ~ passenger_count + trip_distance + trip_time_in_secs + direct_distance", data = InputDataSet);
    
    ## Serialize model
    trained_model = pickle.dumps(logitObj)
    ',
    @input_data_1 = N'
    select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance, 
    dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
    from nyctaxi_sample_training
    ',
    @input_data_1_name = N'InputDataSet',
    @params = N'@trained_model varbinary(max) OUTPUT',
    @trained_model = @trained_model OUTPUT;
    ;
    END;
    GO
    

--The SELECT query uses the custom scalar function fnCalculateDistance to calculate the direct distance between the boarding position and the disembarking position. The result of the query is stored in the default Python input variable ʻInputDataset. --The Python script calls the logisticRegression function of revoscalepy included in Machine Learning Services to create a logistic regression model. --Create a model with tapped as the objective variable (label) and passenger_count, trip_distance, trip_time_in_secs, and direct_distance as the explanatory variables (feature). --The trained model indicated by the Python variable logitObj` is serialized and returned as an output parameter. By registering this output in the nyc_taxi_models table, you can use it repeatedly for future prediction.

  1. Execute the following SQL statement to register the trained model in the nyc_taxi_models table.

    DECLARE @model VARBINARY(MAX);
    EXEC TrainTipPredictionModelRxPy @model OUTPUT;
    INSERT INTO nyc_taxi_models (name, model) VALUES('revoscalepy_model', @model);
    

Data processing and model fitting can take a few minutes. Messages that are piped to Python's stdout stream are displayed in the Management Studio message window.

![sqldev-python-step5-4-gho9o9.png](https://qiita-image-store.s3.amazonaws.com/0/195839/ed97cdcb-3322-c5c2-c24d-e7d5d71bcf21.png)
  1. Make sure that one new record is added to the nyc_taxi_models table and the serialized model is registered.

    sqldev-python-step5-5-gho9o9.png

The next step is to create a forecast using the trained model.

Link

Next step

Step 6: Manipulate the model

Previous step

Step 4: Feature extraction of data using T-SQL

From the beginning of the tutorial

In-Database Python Analysis for SQL Developers

Source

Step 5: Train and save a model using T-SQL

Related item

Machine Learning Services with Python

Recommended Posts

[In-Database Python Analysis Tutorial with SQL Server 2017] Step 5: Training and saving a model using T-SQL
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 6: Using the model
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 4: Feature extraction of data using T-SQL
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 3: Data Exploration and Visualization
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 2: Import data to SQL Server using PowerShell
[In-Database Python Analysis Tutorial with SQL Server 2017]
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 1: Download sample data
Launch a web server with Python and Flask
Library for specifying a name server and dig with python
This and that for using Step Functions with CDK + Python
Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]
I made a poker game server chat-holdem using websocket with python
I made a Chatbot using LINE Messaging API and Python (2) ~ Server ~
Deploy and use the prediction model created in Python on SQL Server
A memo with Python2.7 and Python3 on CentOS
Recommendation tutorial using association analysis (python implementation)
Tweet analysis with Python, Mecab and CaboCha
I'm using tox and Python 3.3 with Travis-CI
Implement a model with state and behavior
Put Docker in Windows Home and run a simple web server with Python
Execute raw SQL using python data source with redash and display the result
Until you install Python with pythonbrew and run Flask on a WSGI server
[Python] How to create a local web server environment with SimpleHTTPServer and CGIHTTPServer
Python: Time Series Analysis: Building a SARIMA Model
Building a python environment with virtualenv and direnv
Create a web map using Python and GDAL
Solving the Lorenz 96 model with Julia and Python
Create a Mac app using py2app and Python3! !!
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS
The procedure from generating and saving a learning model by machine learning, making it an API server, and communicating with JSON from a browser