I tried Python-like loop processing with BigQuery Scripting

Introduction

BigQuery Scripting was released on Beta on October 3rd of last year (2019). If you use it well, you can replace Python processing, right? I was enthusiastic, but I (+ team) who is not so good at programming do not know how to use it.

You are actually using "[BigQuery Scripting has been released Beta, so let's walk through lightly](https://medium.com/google-cloud-jp/bigquery-scripting%E3%81%8Cbeta%E3%83" % AA% E3% 83% AA% E3% 83% BC% E3% 82% B9% E3% 81% 95% E3% 82% 8C% E3% 81% 9F% E3% 81% AE% E3% 81% A7 % E8% BB% BD% E3% 81% 8F% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 82% AF% E3% 82% B9% E3% 83% AB% E3 % 83% BC% E3% 81% 97% E3% 81% A6% E3% 81% BF% E3% 82% 8B-1408bab2c026) ", but it is difficult to use.

After a few months of such a pain, I finally got a Python-like loop process, so I will share it.

What is BigQuery Scripting?

As you know, BigQuery can execute SQL processing, but there are times when you want to conditional branch or loop the SQL to be executed by an external variable. Normally, it is called in another language such as Python to branch and repeat SQL processing, but Scripting is to incorporate it into BigQuery to that extent. (Perhaps)

The most understandable processing that can be used is "Standard SQL Scripting", but the following processing is possible. There is. (Representative example)

--DECLARE: Variable declaration --SET: Assigning a value to a variable --IF ~ ELSE ~ END IF: Conditional branch --LOOP / WHILE: Loop processing

Thing you want to do

For example, if you want to turn SQL for each store and put out the total sales in different tables, you can call BigQuery from Python as follows.

for store in stores:
  query=f"""
  SELECT
   store, SUM(Purchase price)AS total amount
   FROM `myproject.mydataset.transaction_*`
   WHERE store = {store_cd}
   GROUP BY store
  """

BigQuery Scripting also has Loop processing, so it can be done easily like this. I was thinking. However, "[BigQuery Scripting has been released Beta, so let's walk through lightly](https://medium.com/google-cloud-jp/bigquery-scripting%E3%81%8Cbeta%E3%83%AA%" E3% 83% AA% E3% 83% BC% E3% 82% B9% E3% 81% 95% E3% 82% 8C% E3% 81% 9F% E3% 81% AE% E3% 81% A7% E8% BB% BD% E3% 81% 8F% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 82% AF% E3% 82% B9% E3% 83% AB% E3% 83% BC% E3% 81% 97% E3% 81% A6% E3% 81% BF% E3% 82% 8B-1408bab2c026) " You can't do array iterations like you would in a Python loop.

I wanted to do it somehow.

What i did

The next process was finally made through various trials and errors to see if something could be done.

#Variable declaration
DECLARE stores ARRAY<STRING>; #Make an array called stores and run the loop
DECLARE x INT64 DEFAULT 1; #Arguments used in stores (default value = 1)

#Assignment to a variable
#I wanted to automatically create an ARRAY value (store list), so ARRAY_Created with AGG

SET stores = (
SELECT ARRAY_AGG(store_cd) as list
FROM (SELECT store_cd FROM `myproject.mydataset.mytable` GROUP BY store_cd ORDER BY store_cd)
);

#Loop processing
#Repeat up to the length of stores

WHILE x <= array_length(stores) DO
  SELECT store_cd, SUM(Purchase price)AS total amount
  FROM `myproject.mydataset.mytable`
  WHERE store_cd=stores [ORDINAL(x)] #Get the Xth value of stores
  GROUP BY store_cd;

  SET x = x + 1;
END WHILE;

Yup. Somehow, I was able to do Loop processing that seems to be used in old JAVA.

in conclusion

For the time being, I was able to loop processing only during BigQuery. In the first place, I wrote without using ARRAY so much, so it was a pain to use that process. (If you can, it seems easy)

In the future, I would like to take on the challenge of conditional branching (IF statement).

Recommended Posts

I tried Python-like loop processing with BigQuery Scripting
I tried natural language processing with transformers.
I tried simple image processing with Google Colaboratory.
I tried fp-growth with python
I tried scraping with Python
I tried Learning-to-Rank with Elasticsearch!
I tried using BigQuery ML
I tried 100 language processing knock 2020
I tried gRPC with Python
I tried scraping with python
I tried trimming efficiently with OpenCV
I tried summarizing sentences with summpy
I tried machine learning with liblinear
I tried moving food with SinGAN
I tried 100 language processing knock 2020: Chapter 3
I tried implementing DeepPose with PyTorch
I tried face detection with MTCNN
I tried 100 language processing knock 2020: Chapter 1
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python
I tried sentence generation with GPT-2
I tried learning LightGBM with Yellowbrick
I tried face recognition with OpenCV
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
I tried asynchronous processing using asyncio
I tried natural number expression and arithmetic processing only with list processing
I tried multiple regression analysis with polynomial regression
I tried sending an SMS with Twilio
I tried using Amazon SQS with django-celery
I tried to implement Autoencoder with TensorFlow
I tried linebot with flask (anaconda) + heroku
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried scraping Yahoo News with Python
I tried using Selenium with Headless chrome
I tried factor analysis with Titanic data!
I tried learning with Kaggle's Titanic (kaggle②)
I tried sending an email with python.
I tried non-photorealistic rendering with Python + opencv
I tried a functional language with Python
I tried batch normalization with PyTorch (+ note)
I tried recursion with Python ② (Fibonacci sequence)
I tried implementing DeepPose with PyTorch PartⅡ
I tried to create CSV upload, data processing, download function with Django
I tried to implement CVAE with PyTorch
I tried playing with the image with Pillow
I tried to solve TSP with QAOA
I tried simple image recognition with Jupyter
I tried CNN fine tuning with Resnet
I tried using the BigQuery Storage API
#I tried something like Vlookup with Python # 2
I tried to extract named entities with the natural language processing library GiNZA
I tried handwriting recognition of runes with scikit-learn
I tried to predict next year with AI
I tried "smoothing" the image with Python + OpenCV
I tried hundreds of millions of SQLite with python
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried image recognition of CIFAR-10 with Keras-Learning-