[PYTHON] [AWS] Do SSI-like things with S3 / Lambda

2020.11.16 update Alibaba Cloud version I made it.

TL;DR HTML that describes SSI (include virtual) is stored in a specific bucket of S3, and the sources in the include are combined with Lambda and then stored in another bucket.

What I wanted to do

Since S3 is a static site hosting service, it is not possible in principle to process dynamically on the server side. Of course, the CDN side (CloudFront), which is the cache destination after that, cannot be used either.

In that case, I think that the standard method is to build a local development environment, keep it as a separate file locally, and combine it when compiling, but S3 + is an existing site that originally used SSI. Unfortunately, that flow cannot be introduced for all projects, including when transferring to CloudFront.

So, if SSI was originally used, I tried to adjust it on the AWS side so that it can be used as it is without replacing it as it is.

What i did

(As a premise, IAM settings have been completed)

Basically, I refer to this article and customize it for myself. Do something like SSI with S3 and Lambda

On the reference site, I was a little worried that I had to add .ssi as a suffix to the extension of the original file, so A bucket for temp is prepared separately and adjusted so that it can be processed without adding a suffix.

Constitution

Basically, the only services I use are S3 and Lambda. CloudFront if you need it. ARMS_TECHSTACK (1) (1).png

Setting method

S3 Prepare two buckets for temp to upload files and a bucket for publishing.

Public bucket

The name can be anything. This time it is s3-ssi-include.

Bucket for temp

The name can be anything. This time it is s3-ssi-include-base.

Each setting

It is assumed that the access permissions are set appropriately. The temp bucket stores files and is just passed to the public bucket, so there is no need to publish it. If you also use CloudFront for your publishing bucket, you don't need to publish it.

Bucket for temp

From the details page of the bucket for temp, go to "Properties"-> "Event Notification"-> "Create Event Notification". Lambda function is now started when the file is uploaded (PUT event)

--Event type: PUT --Destination: Lambda function

After selecting up to, complete the setting once. After creating the Lambda function later, I returned to this screen again and

--Specify Lambda function: Specify the Lambda function created earlier in "Select from Lambda functions"

It is necessary to set.

Lambda

Lambda detects the PUT event, and if SSI (Server Side Includes) is described in the uploaded HTML file, Lambda will include it and create a function to store it in another bucket. For S3, edit the resource base policy for public for temp and grant the authority. I think the following will be helpful. Lambda resource-based policy when triggered by S3

Function code

import json
import os
import logging
import boto3
from botocore.errorfactory import ClientError
import re
import urllib.parse

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
def lambda_handler(event, context):
    logger.info('## ENVIRONMENT VARIABLES')
    logger.info(os.environ)
    logger.info('## EVENT')
    logger.info(event)
 
    input_bucket = event['Records'][0]['s3']['bucket']['name']
    output_bucket = os.environ['S3_BUCKET_TARGET']

    logger.info('## INPUT BUKET')
    logger.info(input_bucket)
    
    input_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    logger.info('## INPUT KEY')
    logger.info(input_key)

    try:
        #Get input file
        response = s3.get_object(Bucket=input_bucket, Key=input_key)

        if not input_key.endswith('.html'):
            s3.copy_object(Bucket=output_bucket, Key=input_key, CopySource={'Bucket': input_bucket, 'Key': input_key})
        else:
            input_html = response[u'Body'].read().decode('utf-8')
            output_html = input_html
            #Get SSI description
            include_path_base = re.findall(r'<!--#include virtual="/(.*?)" -->.*?\n', input_html, flags=re.DOTALL)
            logger.info('## PATH BASE')
            logger.info(include_path_base)
            if len(include_path_base) > 0:
                for path in include_path_base:
                    include_path = path
                    logger.info('## PATH')
                    logger.info(include_path)
            
                    #Get SSI file
                    try:
                        include = s3.get_object(Bucket=input_bucket, Key=include_path)
                        include_html = include[u'Body'].read().decode('utf-8')
                        #Run SSI
                        output_html = output_html.replace('<!--#include virtual="/' + include_path + '" -->', include_html)
                    except ClientError:
                        pass
            
            #File output
            logger.info('## OUTPUT BUKET')
            logger.info(output_bucket)
            
            output_key    = input_key
            logger.info('## OUTPUT KEY')
            logger.info(output_key)
            
            out_s3   = boto3.resource('s3')
            s3_obj   = out_s3.Object(output_bucket, output_key)
            response = s3_obj.put(Body = bytes(output_html, 'UTF-8'))
    except Exception as e:
        logger.info(e)
        raise e

Other settings

Environment variable

Set the public bucket name (this time s3-ssi-include) to the environment variable S3_BUCKET_TARGET on the management screen.

Also, save it once so far, and specify the __Lambda function on the S3 side: Specify the Lambda function created earlier in "Select from Lambda functions" __.

in conclusion

This completes the function that Lambda embeds the include file at the timing of uploading the file to the S3 bucket for temp and transfers it to the original public S3 bucket.

I think that it can be stored in S3 as if it were uploaded to a normal Web server, so if you need to migrate a site that uses SSI to S3 + CloudFront for site migration etc., it is common that is described in SSI. You can migrate files without having to replace them all at once. If you replace the common files that are originally managed by SSI with each file, the subsequent common files will be hard-coded, which will increase the operation man-hours and risk. To be honest, I don't want to do much because there is a risk of human error in batch replacement itself. Given that, I'm wondering if this function is quite convenient. However, the disadvantage is that it costs a fee because it uses two buckets, and it is a little difficult to understand, so it is necessary to make it known properly.

Nowadays, with the evolution of CI / CD and Docker, I think that there are fewer situations where you are worried about the above. There are not only such sites in the world, so I wonder if there is such a demand for it.

That's all from the field.

Recommended Posts

[AWS] Do SSI-like things with S3 / Lambda
[AWS] Link Lambda and S3 with boto3
Connect to s3 with AWS Lambda Python
AWS Lambda with PyTorch [Lambda import]
[AWS] What to do when you want to pip with Lambda
Output CloudWatch Logs to S3 with AWS Lambda (Pythyon ver)
[AWS] Create API with API Gateway + Lambda
Easy AWS S3 testing with MinIO
Notify HipChat with AWS Lambda (Python)
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Send images taken with ESP32-WROOM-32 to AWS (API Gateway → Lambda → S3)
[AWS] Using ini files with Lambda [Python]
Python + Selenium + Headless Chromium with aws lambda
I just did FizzBuzz with AWS Lambda
[AWS] Create a Python Lambda environment with CodeStar and do Hello World
Upload what you got in request to S3 with AWS Lambda Python
[AWS SAM] Create API with DynamoDB + Lambda + API Gateway
Regular serverless scraping with AWS lambda + scrapy Part 1.8
View images on S3 with API Gateway + Lambda
Serverless scraping using selenium with [AWS Lambda] -Part 1-
Serverless application with AWS SAM! (APIGATEWAY + Lambda (Python))
[AWS] Try tracing API Gateway + Lambda with X-Ray
Export RDS snapshot to S3 with Lambda (Python)
I tried connecting AWS Lambda with other services
Infrastructure construction automation with CloudFromation + troposphere + AWS Lambda
[AWS] Play with Step Functions (SAM + Lambda) Part.3 (Branch)
Deploy Python3 function with Serverless Framework on AWS Lambda
Create a Layer for AWS Lambda Python with Docker
[AWS] Play with Step Functions (SAM + Lambda) Part.1 (Basic)
I want to AWS Lambda with Python on Mac!
Manage your Amazon CloudWatch loggroup retention with AWS Lambda
Things to do when you start developing with Django
Make ordinary tweets fleet-like with AWS Lambda and Python
[AWS] Play with Step Functions (SAM + Lambda) Part.2 (Parameter)
Make Lambda Layers with Lambda
Aggregate AWS S3 data
Do Houdini with Python3! !! !!
Tweet from AWS Lambda
S3 uploader with boto
Try AWS Lambda Destinations
[AWS] Try adding Python library to Layer with SAM + Lambda (Python)
Try automating Start / Stop for EC2 instances with AWS Lambda
I just built a virtual environment with AWS lambda layer
Create API with Python, lambda, API Gateway quickly using AWS SAM
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
Site monitoring and alert notification with AWS Lambda + Python + Slack