BigQuery-Python was useful when working with BigQuery from Python

I think there are various ways to handle it from Python, but I personally use BigQuery-Python, so I'll make a brief note.

Installation

$ pip install bigquery-python

document

https://github.com/tylertreat/BigQuery-Python/blob/master/README.md

Preparation

Get the following information

--Service account --Key file (p12 or pem)

Simple example

Get BigQueryClient

from bigquery import get_client

PROJECT_ID = 'project-999'
SERVICE_ACCOUNT = '[email protected]'
PRIVATE_KEY_PATH = '/path.to/keyfile.p12'

with open(PRIVATE_KEY_PATH, 'rb') as f:
    private_key = f.read()

client = get_client(PROJECT_ID,
                    private_key=private_key,
                    service_account= SERVICE_ACCOUNT,
                    readonly=False)

Creating dataset

DATASET = 'spam'

if not client.check_dataset(DATASET):
    client.create_dataset(DATASET)

Creating a table

schema.json


[
  {
    "name": "id",
    "type": "INTEGER"
  },
  {
    "name": "name",
    "type": "STRING"
  },
  {
    "name": "time",
    "type": "TIMESTAMP"
  }
]
DATASET = 'spam'
TABLE_NAME = 'egg'
SCHEMA_PATH = '/path.to/schema.json'

with open(SCHEMA_PATH, 'r') as f:
    table_schema = json.load(f)

if not client.check_dataset(DATASET):
    raise

if not client.check_table(DATASET, TABLE_NAME):
	client.create_table(DATASET, TABLE_NAME, table_schema)

(It is not necessary to set the schema to json, but it seems that it can be reused, so this time it is set to json.)

Import from Google Cloud Storage

//spam/egg.csv


"id", "name", "time"
1, "S", "2015-05-18 00:00:00"
2, "Y", "2015-11-02 00:00:00"
from bigquery import JOB_SOURCE_FORMAT_CSV

KEY_NAME = 'gs://spam/egg.csv'
TABLE_NAME = 'egg'

client.import_data_from_uris(
    TABLE_NAME,
    KEY_NAME,
    field_delimiter='\t',  #Tab delimited
    source_format=JOB_SOURCE_FORMAT_CSV,  # csv
    skip_leading_rows=1)  #Skip the first line of csv

Execute query

Specify timeout to get it synchronously. If it takes longer than that, a BigQueryTimeoutException will occur.

from bigquery.errors import BigQueryTimeoutException

query = 'SELECT id, name, time FROM [spam.egg] LIMIT 10'

try:
    job_id, results = client.query(query, timeout=60)
except BigQueryTimeoutException as e:
    print e

# job_id: u'job_xxxx_xxxxx'
# results: [
#    {u'id': 1, u'name': 'S',  u'time': "2015-11-02 00:00:00", }, 
#    {u'id': 2, u'name': 'Y',  u'time': "2015-11-02 00:00:00", }, 
# ]

If it seems to take time, you can consider getting it asynchronously. In that case, do not specify timeout.

query = 'SELECT id, name, time FROM [spam.egg] LIMIT 10'
job_id, results = client.query(query)

# job_id: u'job_xxxx_xxxxx'
# results: []

Since job_id is issued, you can get the contents by putting job_id in the argument of get_query_rows.

completed, _total_rows = client.check_job(job_id)

if completed:
    results = client.get_query_rows(job_id)

# results: [
#    {u'id': 1, u'name': 'S',  u'time': "2015-11-02 00:00:00", }, 
#    {u'id': 2, u'name': 'Y',  u'time': "2015-11-02 00:00:00", }, 
# ]

Other

In addition, you can see that you can do various things by looking inside the BigQuery Client.

Query Builder

There is also a mechanism that makes a nice query by calling render_query. If I used it, it might have been a bad way to find it, but I couldn't handle BigQuery-specific functions such as adding LIMIT, table decorator, and JSON_EXTRACT, so I had to write it myself. It seems to be a bad idea. https://github.com/tylertreat/BigQuery-Python#query-builder

Google API documentation

Since the base is Google's API, Google's API documentation may be useful depending on the situation. https://cloud.google.com/bigquery/docs/reference/v2/

Operation in Linux environment

If you handle the key file by the method described here, an error may occur in the handling related to encryption in the Linux environment. The Python libraries that depend on BigQuery-Python are installed at the same time, but in a Linux environment it may not be installed due to lack of dependent tools. One of them is a library called cryptography.

BigQuery-I'm trying to catch ʻImportErrorinside Python and use another library, but if that doesn't work, I get an error when running the script, which is annoying. The installation ofcryptography` itself can be installed below.

$ pip install cryptography

However, you have to install the dependent tools with yum etc. for that. As detailed in the documentation, I did the following in my environment:

$ sudo yum install gcc libffi-devel python-devel openssl-devel

Cryptography document http://cryptography.readthedocs.org/en/latest/installation/

Recommended Posts

BigQuery-Python was useful when working with BigQuery from Python
Working with Azure CosmosDB from Python Part.2
Use BigQuery from python.
Working with Azure CosmosDB from Python (quick start digging)
Connect to BigQuery with Python
Working with LibreOffice in Python
Useful when debugging with TouchDesigner
Python: Working with Firefox with selenium
Working with sounds in Python
Error when playing with python
With skype, notify with skype from python!
Problem not knowing parameters when dealing with Blender from Python
Xpath summary when extracting data from websites with Python Scrapy
Sample code summary when working with Google Spreadsheets from Google Colab
Call C from Python with DragonFFI
Using Rstan from Python with PypeR
Working with LibreOffice in Python: import
Install Python from source with Ansible
Create folders from '01' to '12' with python
Tips you should know when programming competitive programming with Python2 (useful library)
Run Aprili from Python with Orange
Call python from nim with Nimpy
Precautions when using phantomjs from python
When matplotlib doesn't work with python2.7
Read fbx from python with cinema4d
When using MeCab with virtualenv python
Precautions when using six with Python 2.5
Working with DICOM images in Python
[Python] Format when to_csv with pandas
A story I was addicted to when inserting from Python to a PostgreSQL table
How to deal with OAuth2 error when using Google APIs from Python
Use Python from Java with Jython. I was also addicted to it.
Collecting information from Twitter with Python (Twitter API)
Receive textual data from mysql with python
Get html from element with Python selenium
[Note] Get data from PostgreSQL with Python
Snippet when searching all bits with python
Play audio files from Python with interrupts
Create wordcloud from your tweet with python3
Try working with binary data in Python
Note when creating an environment with python
Tweet from python with Twitter Developer + Tweepy
Precautions when solving DP problems with Python
Post Test 3 (Working with PosgreSQL in Python)
How to work with BigQuery in Python
Business efficiency starting from scratch with Python
Decrypt files encrypted with openssl from python with openssl
Working with OpenStack using the Python SDK
Image acquisition from camera with Python + OpenCV
Getting started with Dynamo from Python boto
Try calling Python from Ruby with thrift
Scraping from an authenticated site with python
Manipulate BigQuery tables from a Python client
Use C ++ functions from python with pybind11
Working with GPS on Raspberry Pi 3 Python
Three things I was addicted to when using Python and MySQL with Docker
A note I was addicted to when running Python with Visual Studio Code
A story that I was addicted to when I made SFTP communication with python
Error when installing a module with Python pip
Collecting information from Twitter with Python (Environment construction)
Csv output from Google search with [Python]! 【Easy】