Introduction

Issue a BigQuery query from Lambda. It is the investigation record. I'm basically using AWS, but I had the opportunity to refer to GCP's BigQuery on a regular basis. I thought it would be convenient to run it on Lambda easily.

Environmental overview

Use the GCP SDK from Lambda's Python. Keep the GCP SDK in layers. You need to set up GCP authentication on the AWS side.

The SDK is the Python client library. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html

procedure

Prerequisites

You have an AWS account. I have a GCP account. You can use BigQuery from the API. BigQuery table has been created. I have an AWS access key

Create a Lambda function.

Python code that only runs BigQuery. First, let's move this.

--The Lambda function settings are as follows.

Create New
Python3.7 --Access rights are only for automatically generated CloudWatch Logs. --The timeout is set to 30 seconds. --Layer and environment variables later.

import json
from google.cloud import bigquery

def lambda_handler(event, context):
    client = bigquery.Client()
    sql = """
        SELECT *
        FROM `<my-project>.<my-dataset>.<my-table>`
        LIMIT 10
    """
    
    # Run a Standard SQL query using the environment's default project
    results = client.query(sql).result()
    for row in results:
        print(row)

    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Create an SDK for GCP to register with the Lambda layer.

Add the SDK to the layer to use ʻimport bigquery` in Lambda's Python. Get it with pip and zip it. Here are the steps to boot Linux on a Spot Instance on EC2 and put it on S3. Quickly.

--Create Amazon Linux 2 with a Spot Instance of EC2. ――Small specs are enough. --The IAM role grants only ʻAmazon EC2Role for SSM` added. To connect with the Session Manager of Systems Manager. --Security groups are unounded. --No key pair.

--Once the instance is launched, connect from the Systems Manager session manager.

The execution procedure is described. For <>, set your own value.

# ec2-Become a user
sudo su - ec2-user

#pip installation
sudo yum install python3 -y
curl -O https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py

export PATH=$PATH:/usr/local/bin

#sdk installation&Zip
pip install google-cloud-bigquery -t ./python/
zip -r google-cloud-bigquery.zip python
#Added because protobuf is required
pip install protobuf --upgrade -t ./python/
zip -r google-cloud-bigquery.zip ./python/google/protobuf

#aws cli settings
aws configure
#Set the following:
  AWS Access Key ID [None]: <my-access-key>
  AWS Secret Access Key [None]: <my-secret-key>
  Default region name [None]: ap-northeast-1
  Default output format [None]: json

#Save to s3
aws s3 mb s3://<my-bucket>
aws s3 cp google-cloud-bigquery.zip s3://<my-bucket>

After saving the SDK to s3, you can delete the spot instance.

Register the created library in the Lambda layer.

Return to Lambda.

--Create a layer. スクリーンショット 2019-11-24 11.18.55.png

The runtime has added Python 3.7 and Python 3.8.

--Add a layer to the function. スクリーンショット 2019-11-24 11.11.29.png

--Select a layer and press Add Layer.

Select the name from "Customer Layer". Select the version you created. スクリーンショット 2019-11-24 11.22.41.png

--Layer has been added.

If you add a layer, you can safely delete the S3 file.

Get a GCP certificate file.

--You need to add authentication.

https://cloud.google.com/docs/authentication/production

--Create a json service account key.

From the Go to the Create Service Account Key page in the link above. I chose "BigQuery Administrator" as the role.

Register the GCP authentication file with Lambda.

json added the text from New File by copy and paste. Add the environment variable GOOGLE_APPLICATION_CREDENTIALS.

Test run

I was able to run a test from the Lambda console!

Clogged points

Without protobuf, I got an error and was in trouble ... I searched for a case on StackOverflow and solved it.

in conclusion

I'm wondering if this is all right, but I posted it because I was able to do it!

--Should the SDK be placed directly under python or in site-packages? I put it directly under Python so that the version of Python is not fixed.

――Is this the way to make the SDK? Where to add to zip

――Can you hide the GCP authentication file more? In environment variables, KMS, parameter stores, etc.

[PYTHON] Run BigQuery from Lambda