from google.colab import auth
auth.authenticate_user()
To use BigQuery from Colab, you need to authenticate first.
It's a short code so it's hard to remember, but there's a code snippet that you can use to add quickly.
You can use the Magic command of google.cloud.bigquery
to get the query result as a Pandas DataFrame in one shot.
%%bigquery df --project myproject
SELECT * FROM `myproject.foo.logs`
A DataFrame is created with the name specified in the argument of the Magic command and can be referenced from the Python side.
df.head(5)
%%bigquery --project myproject df --params {"user_id": 123}
SELECT * FROM `myproject.foo.logs` WHERE user_id = @user_id
You can also pass dictionary parameters with --params
. The passed parameters can be referenced by @parameter name.
import os
df.to_csv(os.path.join("output.csv"), index=False)
The acquired DataFrame can be written to a CSV file with to_csv
. If you mount Google Drive, you can export it on Google Drive, which is convenient.
Depending on the content of the analysis, creating an intermediate table can save the amount of calculation, and the intermediate results can be visualized / checked to improve efficiency.
In the latest google-cloud-bigquery, by specifying --destination_table
, a table is created as it is from the query execution result. You can, but as of June 2020, this option is not available in the version of google-cloud-bigquery
that is included by default in Google Colab, so write code to create a job without using Magic commands.
from google.cloud import bigquery
client = bigquery.Client(project="myproject")
table_id = "myproject.foo.purchase_logs"
job_config = bigquery.QueryJobConfig(destination=table_id)
sql = r'''
SELECT transaction_id, user_id, transaction_date, product_id
FROM `myproject.foo.transactions`
WHERE action = "purchase"
'''
query_job = client.query(sql, job_config=job_config)
query_job.result()
Official documentation: https://cloud.google.com/bigquery/docs/writing-results?hl=ja
QueryJobConfig [WriteDisposition](https: / If you specify WRITE_TRUNCATE
in (/cloud.google.com/bigquery/docs/reference/auditlogs/rest/Shared.Types/WriteDisposition), you can discard the contents and recreate the table if it already exists. (Note that the data in the original table will be discarded)
job_config = bigquery.QueryJobConfig(destination=table_id, write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE)
If WRITE_APPEND
is specified, more data will be added to the existing table.
If you want to create a table from a CSV file, use the bq
command bg load. And upload.
!bq load --project_id=myproject --autodetect --source_format=CSV myproject:foo.products sample.csv
The contents of the specified parameters are changed as appropriate depending on the contents of the CSV to be read.
If you specify --replace
, you can discard the contents of the table if it already exists and then recreate it.
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv?hl=ja
Recommended Posts