Assumption: Authentication

from google.colab import auth
auth.authenticate_user()

To use BigQuery from Colab, you need to authenticate first.

It's a short code so it's hard to remember, but there's a code snippet that you can use to add quickly.

Store query results in a DataFrame

You can use the Magic command of google.cloud.bigquery to get the query result as a Pandas DataFrame in one shot.

%%bigquery df --project myproject
SELECT * FROM `myproject.foo.logs`

A DataFrame is created with the name specified in the argument of the Magic command and can be referenced from the Python side.

df.head(5)

Pass parameters

%%bigquery --project myproject df --params {"user_id": 123}
SELECT * FROM `myproject.foo.logs` WHERE user_id = @user_id

You can also pass dictionary parameters with --params. The passed parameters can be referenced by @parameter name.

Save the result as CSV

import os
df.to_csv(os.path.join("output.csv"), index=False)

The acquired DataFrame can be written to a CSV file with to_csv. If you mount Google Drive, you can export it on Google Drive, which is convenient.

Create a table from the results of a query

Depending on the content of the analysis, creating an intermediate table can save the amount of calculation, and the intermediate results can be visualized / checked to improve efficiency.

In the latest google-cloud-bigquery, by specifying --destination_table, a table is created as it is from the query execution result. You can, but as of June 2020, this option is not available in the version of google-cloud-bigquery that is included by default in Google Colab, so write code to create a job without using Magic commands.

from google.cloud import bigquery

client = bigquery.Client(project="myproject")
table_id = "myproject.foo.purchase_logs"

job_config = bigquery.QueryJobConfig(destination=table_id)

sql = r'''
    SELECT transaction_id, user_id, transaction_date, product_id
    FROM `myproject.foo.transactions`
    WHERE action = "purchase"
'''

query_job = client.query(sql, job_config=job_config)
query_job.result()

Official documentation: https://cloud.google.com/bigquery/docs/writing-results?hl=ja

Overwrite when table exists

QueryJobConfig [WriteDisposition](https: / If you specify WRITE_TRUNCATE in (/cloud.google.com/bigquery/docs/reference/auditlogs/rest/Shared.Types/WriteDisposition), you can discard the contents and recreate the table if it already exists. (Note that the data in the original table will be discarded)

job_config = bigquery.QueryJobConfig(destination=table_id, write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE)

If WRITE_APPEND is specified, more data will be added to the existing table.

Create a table from the contents of the CSV file

If you want to create a table from a CSV file, use the bq command bg load. And upload.

!bq load --project_id=myproject --autodetect --source_format=CSV myproject:foo.products sample.csv

The contents of the specified parameters are changed as appropriate depending on the contents of the CSV to be read.

If you specify --replace, you can discard the contents of the table if it already exists and then recreate it.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv?hl=ja

reference

https://colab.research.google.com/notebooks/bigquery.ipynb
https://cloud.google.com/bigquery/docs/writing-results?hl=ja

[PYTHON] Code snippets often used when using BigQuery with Google Colab