[PYTHON] Code snippets often used when using BigQuery with Google Colab

Assumption: Authentication

from google.colab import auth
auth.authenticate_user()

To use BigQuery from Colab, you need to authenticate first.

Screenshot 2020-06-23 at 13.46.39.png

It's a short code so it's hard to remember, but there's a code snippet that you can use to add quickly.

Store query results in a DataFrame

You can use the Magic command of google.cloud.bigquery to get the query result as a Pandas DataFrame in one shot.

%%bigquery df --project myproject
SELECT * FROM `myproject.foo.logs`

A DataFrame is created with the name specified in the argument of the Magic command and can be referenced from the Python side.

df.head(5)

Pass parameters

%%bigquery --project myproject df --params {"user_id": 123}
SELECT * FROM `myproject.foo.logs` WHERE user_id = @user_id

You can also pass dictionary parameters with --params. The passed parameters can be referenced by @parameter name.

Save the result as CSV

import os
df.to_csv(os.path.join("output.csv"), index=False)

The acquired DataFrame can be written to a CSV file with to_csv. If you mount Google Drive, you can export it on Google Drive, which is convenient.

Create a table from the results of a query

Depending on the content of the analysis, creating an intermediate table can save the amount of calculation, and the intermediate results can be visualized / checked to improve efficiency.

In the latest google-cloud-bigquery, by specifying --destination_table, a table is created as it is from the query execution result. You can, but as of June 2020, this option is not available in the version of google-cloud-bigquery that is included by default in Google Colab, so write code to create a job without using Magic commands.

from google.cloud import bigquery

client = bigquery.Client(project="myproject")
table_id = "myproject.foo.purchase_logs"

job_config = bigquery.QueryJobConfig(destination=table_id)

sql = r'''
    SELECT transaction_id, user_id, transaction_date, product_id
    FROM `myproject.foo.transactions`
    WHERE action = "purchase"
'''

query_job = client.query(sql, job_config=job_config)
query_job.result()

Official documentation: https://cloud.google.com/bigquery/docs/writing-results?hl=ja

Overwrite when table exists

QueryJobConfig [WriteDisposition](https: / If you specify WRITE_TRUNCATE in (/cloud.google.com/bigquery/docs/reference/auditlogs/rest/Shared.Types/WriteDisposition), you can discard the contents and recreate the table if it already exists. (Note that the data in the original table will be discarded)

job_config = bigquery.QueryJobConfig(destination=table_id, write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE)

If WRITE_APPEND is specified, more data will be added to the existing table.

Create a table from the contents of the CSV file

If you want to create a table from a CSV file, use the bq command bg load. And upload.

!bq load --project_id=myproject --autodetect --source_format=CSV myproject:foo.products sample.csv

The contents of the specified parameters are changed as appropriate depending on the contents of the CSV to be read.

If you specify --replace, you can discard the contents of the table if it already exists and then recreate it.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv?hl=ja

reference

Recommended Posts

Code snippets often used when using BigQuery with Google Colab
Sample code summary when working with Google Spreadsheets from Google Colab
Cheat sheet when scraping with Google Colaboratory (Colab)
Easy way to scrape with python using Google Colab
When using optparse with iPython
About learning with google colab
Python frequently used code snippets
Debug settings in virtual environment when using Pipenv with VS Code
[SEO] Flow / sample code when using Google Analytics API in Python
Play with Turtle on Google Colab
When using MeCab with virtualenv python
Precautions when using six with Python 2.5
[VS Code] ~ Tips when using python ~
How to deal with OAuth2 error when using Google APIs from Python
How to use VS Code (code server) with Google Colab in just 3 lines
TypeError: concat () got an unexpected keyword argument'join_axes' when using pandas_profling (Google Colab)
Use MeCab and neologd with Google Colab
Usual processing notes when using Google Colaboratory
Try using Python with Google Cloud Functions
Summary of snippets when developing with Go
Using Java's Jupyter Kernel with Google Colaboratory
Machine learning with Pytorch on Google Colab
Image segment using Oxford_iiit_pet on Google Colab
A memorandum of method often used when analyzing data with pandas (for beginners)