Issue a BigQuery query from Lambda. It is the investigation record. I'm basically using AWS, but I had the opportunity to refer to GCP's BigQuery on a regular basis. I thought it would be convenient to run it on Lambda easily.
Use the GCP SDK from Lambda's Python. Keep the GCP SDK in layers. You need to set up GCP authentication on the AWS side.
The SDK is the Python client library. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html
You have an AWS account. I have a GCP account. You can use BigQuery from the API. BigQuery table has been created. I have an AWS access key
Python code that only runs BigQuery. First, let's move this.
--The Lambda function settings are as follows.
import json
from google.cloud import bigquery
def lambda_handler(event, context):
client = bigquery.Client()
sql = """
SELECT *
FROM `<my-project>.<my-dataset>.<my-table>`
LIMIT 10
"""
# Run a Standard SQL query using the environment's default project
results = client.query(sql).result()
for row in results:
print(row)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Add the SDK to the layer to use ʻimport bigquery` in Lambda's Python. Get it with pip and zip it. Here are the steps to boot Linux on a Spot Instance on EC2 and put it on S3. Quickly.
--Create Amazon Linux 2 with a Spot Instance of EC2. ――Small specs are enough. --The IAM role grants only ʻAmazon EC2Role for SSM` added. To connect with the Session Manager of Systems Manager. --Security groups are unounded. --No key pair.
--Once the instance is launched, connect from the Systems Manager session manager.
The execution procedure is described. For <>, set your own value.
# ec2-Become a user
sudo su - ec2-user
#pip installation
sudo yum install python3 -y
curl -O https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
export PATH=$PATH:/usr/local/bin
#sdk installation&Zip
pip install google-cloud-bigquery -t ./python/
zip -r google-cloud-bigquery.zip python
#Added because protobuf is required
pip install protobuf --upgrade -t ./python/
zip -r google-cloud-bigquery.zip ./python/google/protobuf
#aws cli settings
aws configure
#Set the following:
AWS Access Key ID [None]: <my-access-key>
AWS Secret Access Key [None]: <my-secret-key>
Default region name [None]: ap-northeast-1
Default output format [None]: json
#Save to s3
aws s3 mb s3://<my-bucket>
aws s3 cp google-cloud-bigquery.zip s3://<my-bucket>
After saving the SDK to s3, you can delete the spot instance.
Return to Lambda.
--Create a layer.
The runtime has added Python 3.7
and Python 3.8
.
--Add a layer to the function.
--Select a layer and press Add Layer
.
Select the name from "Customer Layer". Select the version you created.
--Layer has been added.
If you add a layer, you can safely delete the S3 file.
--You need to add authentication.
https://cloud.google.com/docs/authentication/production
--Create a json service account key.
From the Go to the Create Service Account Key page
in the link above.
I chose "BigQuery Administrator" as the role.
json added the text from New File
by copy and paste.
Add the environment variable GOOGLE_APPLICATION_CREDENTIALS
.
I was able to run a test from the Lambda console!
Without protobuf, I got an error and was in trouble ... I searched for a case on StackOverflow and solved it.
I'm wondering if this is all right, but I posted it because I was able to do it!
--Should the SDK be placed directly under python or in site-packages? I put it directly under Python so that the version of Python is not fixed.
――Is this the way to make the SDK? Where to add to zip
――Can you hide the GCP authentication file more? In environment variables, KMS, parameter stores, etc.
Recommended Posts