In Basic machine learning procedure: (1) Classification model, data is imported from BigQuery into the Python environment and analyzed by scikit-learn.
However, recently, like BigQueryML, machine learning can be performed only within BigQuery. This time, I will try BigQuery ML.
Google BigQuery Google Colaboratory
-Google Cloud launches "BigQuery ML" for machine learning with SQL statements
Similar to Previous, create result as the campaign response and product1 ~ as the purchase price of the product.
id | result | product1 | product2 | product3 | product4 | product5 |
---|---|---|---|---|---|---|
001 | 1 | 2500 | 1200 | 1890 | 530 | null |
002 | 0 | 750 | 3300 | null | 1250 | 2000 |
Until now, BigQuery had only TABLE and VIEW, but it can also be saved in the MODEL format. (There are other formats such as FUNCTION)
from google.cloud import bigquery
query=f"""CREATE OR REPLACE MODEL `myproject.mydataset.mymodel`
OPTIONS
(model_type='logistic_reg', labels = ['result']) AS #Objective variable (expected variable)
#Predict using the following variables
SELECT result, product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable_training`
"""
job = client.query(query)
result = job.result()
The following three can be selected for model_type. (It seems that you can use the Tensorflow model, but I will omit it here.)
--logistic_reg: Logistic regression analysis (objective variable is categorical variable) --linear_reg: Linear regression analysis (objective variable is a numerical variable) --kmeans: Cluster analysis
This time, we use logistic_reg because it is whether or not to respond to the promotion.
Call the model created by ML.EVALUATE and validate it with test data.
query=f"""
SELECT
roc_auc, precision, recall
FROM
ML.EVALUATE(MODEL `myproject.mydataset.mymodel`, ( #Call the created model
#Validate with different test data
SELECT result, product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable_test`
))
"""
job = client.query(query)
result = job.result()
The accuracy of test data is evaluated by Accuracy, Precision, and Recall.
Call the model created by ML.PREDICT and apply the model to the new data.
query=f"""
SELECT
*
FROM
ML.PREDICT(MODEL `myproject.mydataset.mymodel`, ( #Call the created model
#Apply the model to the new data
SELECT product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable`)
);
"""
#Project data set table name to output
project = "myproject"
client = bigquery.Client(project=project)
dataset = "mydataset"
ds = client.dataset(dataset)
table = "mytable_predict"
job_config = bigquery.QueryJobConfig()
job_config.destination = ds.table(table)
job = client.query(query, job_config=job_config)
result = job.result()
ML.EVALUATE when evaluating the model. To apply, just call each model created by ML.PREDICT. It's pretty easy to use.
The methods that can be used are still limited, but it is easier to use than when it was created with Basic machine learning procedure: ① Classification model. ..
On the other hand, if you can make it so easily, you will be wondering what to do when trying to improve the model. I wonder if it will improve depending on which variable is used.
Recommended Posts