Introduction

In Basic machine learning procedure: (1) Classification model, data is imported from BigQuery into the Python environment and analyzed by scikit-learn.

However, recently, like BigQueryML, machine learning can be performed only within BigQuery. This time, I will try BigQuery ML.

Analytical environment

Google BigQuery Google Colaboratory

Referenced page

-Google Cloud launches "BigQuery ML" for machine learning with SQL statements

Bridging the gap between data and insights -BigQuery ML documentation

Target data

Similar to Previous, create result as the campaign response and product1 ~ as the purchase price of the product.

id	result	product1	product2	product3	product4	product5
001	1	2500	1200	1890	530	null
002	0	750	3300	null	1250	2000

1. Build a model

Until now, BigQuery had only TABLE and VIEW, but it can also be saved in the MODEL format. (There are other formats such as FUNCTION)

from google.cloud import bigquery

query=f"""CREATE OR REPLACE MODEL `myproject.mydataset.mymodel`
OPTIONS
  (model_type='logistic_reg', labels = ['result']) AS #Objective variable (expected variable)

#Predict using the following variables
SELECT result, product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable_training`
"""

job = client.query(query)
result = job.result()

The following three can be selected for model_type. (It seems that you can use the Tensorflow model, but I will omit it here.)

--logistic_reg: Logistic regression analysis (objective variable is categorical variable) --linear_reg: Linear regression analysis (objective variable is a numerical variable) --kmeans: Cluster analysis

This time, we use logistic_reg because it is whether or not to respond to the promotion.

2. Evaluate the model

Call the model created by ML.EVALUATE and validate it with test data.

query=f"""
SELECT
  roc_auc, precision, recall
FROM
  ML.EVALUATE(MODEL `myproject.mydataset.mymodel`,  ( #Call the created model

#Validate with different test data
SELECT result, product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable_test`
))
"""

job = client.query(query)
result = job.result()

The accuracy of test data is evaluated by Accuracy, Precision, and Recall.

3. Apply the model

Call the model created by ML.PREDICT and apply the model to the new data.

query=f"""
SELECT
*
FROM
  ML.PREDICT(MODEL `myproject.mydataset.mymodel`,  ( #Call the created model

#Apply the model to the new data
SELECT product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable`)
);
"""

#Project data set table name to output
project = "myproject"
client = bigquery.Client(project=project)
dataset = "mydataset"
ds = client.dataset(dataset)
table = "mytable_predict"

job_config = bigquery.QueryJobConfig()
job_config.destination = ds.table(table)
job = client.query(query, job_config=job_config)

result = job.result()

ML.EVALUATE when evaluating the model. To apply, just call each model created by ML.PREDICT. It's pretty easy to use.

in conclusion

The methods that can be used are still limited, but it is easier to use than when it was created with Basic machine learning procedure: ① Classification model. ..

On the other hand, if you can make it so easily, you will be wondering what to do when trying to improve the model. I wonder if it will improve depending on which variable is used.

[PYTHON] I tried using BigQuery ML