2020.11.16 update Alibaba Cloud version I made it.

TL;DR HTML that describes SSI (include virtual) is stored in a specific bucket of S3, and the sources in the include are combined with Lambda and then stored in another bucket.

What I wanted to do

Since S3 is a static site hosting service, it is not possible in principle to process dynamically on the server side. Of course, the CDN side (CloudFront), which is the cache destination after that, cannot be used either.

In that case, I think that the standard method is to build a local development environment, keep it as a separate file locally, and combine it when compiling, but S3 + is an existing site that originally used SSI. Unfortunately, that flow cannot be introduced for all projects, including when transferring to CloudFront.

So, if SSI was originally used, I tried to adjust it on the AWS side so that it can be used as it is without replacing it as it is.

What i did

(As a premise, IAM settings have been completed)

Basically, I refer to this article and customize it for myself. Do something like SSI with S3 and Lambda

On the reference site, I was a little worried that I had to add .ssi as a suffix to the extension of the original file, so A bucket for temp is prepared separately and adjusted so that it can be processed without adding a suffix.

Constitution

Basically, the only services I use are S3 and Lambda. CloudFront if you need it. ARMS_TECHSTACK (1) (1).png

Setting method

S3 Prepare two buckets for temp to upload files and a bucket for publishing.

Public bucket

The name can be anything. This time it is s3-ssi-include.

Bucket for temp

The name can be anything. This time it is s3-ssi-include-base.

Each setting

It is assumed that the access permissions are set appropriately. The temp bucket stores files and is just passed to the public bucket, so there is no need to publish it. If you also use CloudFront for your publishing bucket, you don't need to publish it.

Bucket for temp

From the details page of the bucket for temp, go to "Properties"-> "Event Notification"-> "Create Event Notification". Lambda function is now started when the file is uploaded (PUT event)

--Event type: PUT --Destination: Lambda function

After selecting up to, complete the setting once. After creating the Lambda function later, I returned to this screen again and

--Specify Lambda function: Specify the Lambda function created earlier in "Select from Lambda functions"

It is necessary to set.

Lambda

Lambda detects the PUT event, and if SSI (Server Side Includes) is described in the uploaded HTML file, Lambda will include it and create a function to store it in another bucket. For S3, edit the resource base policy for public for temp and grant the authority. I think the following will be helpful. Lambda resource-based policy when triggered by S3

Function code

import json
import os
import logging
import boto3
from botocore.errorfactory import ClientError
import re
import urllib.parse

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
def lambda_handler(event, context):
    logger.info('## ENVIRONMENT VARIABLES')
    logger.info(os.environ)
    logger.info('## EVENT')
    logger.info(event)
 
    input_bucket = event['Records'][0]['s3']['bucket']['name']
    output_bucket = os.environ['S3_BUCKET_TARGET']

    logger.info('## INPUT BUKET')
    logger.info(input_bucket)
    
    input_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    logger.info('## INPUT KEY')
    logger.info(input_key)

    try:
        #Get input file
        response = s3.get_object(Bucket=input_bucket, Key=input_key)

        if not input_key.endswith('.html'):
            s3.copy_object(Bucket=output_bucket, Key=input_key, CopySource={'Bucket': input_bucket, 'Key': input_key})
        else:
            input_html = response[u'Body'].read().decode('utf-8')
            output_html = input_html
            #Get SSI description
            include_path_base = re.findall(r'<!--#include virtual="/(.*?)" -->.*?\n', input_html, flags=re.DOTALL)
            logger.info('## PATH BASE')
            logger.info(include_path_base)
            if len(include_path_base) > 0:
                for path in include_path_base:
                    include_path = path
                    logger.info('## PATH')
                    logger.info(include_path)
            
                    #Get SSI file
                    try:
                        include = s3.get_object(Bucket=input_bucket, Key=include_path)
                        include_html = include[u'Body'].read().decode('utf-8')
                        #Run SSI
                        output_html = output_html.replace('<!--#include virtual="/' + include_path + '" -->', include_html)
                    except ClientError:
                        pass
            
            #File output
            logger.info('## OUTPUT BUKET')
            logger.info(output_bucket)
            
            output_key    = input_key
            logger.info('## OUTPUT KEY')
            logger.info(output_key)
            
            out_s3   = boto3.resource('s3')
            s3_obj   = out_s3.Object(output_bucket, output_key)
            response = s3_obj.put(Body = bytes(output_html, 'UTF-8'))
    except Exception as e:
        logger.info(e)
        raise e

Other settings

Environment variable

Set the public bucket name (this time s3-ssi-include) to the environment variable S3_BUCKET_TARGET on the management screen.

Also, save it once so far, and specify the __Lambda function on the S3 side: Specify the Lambda function created earlier in "Select from Lambda functions" __.

in conclusion

This completes the function that Lambda embeds the include file at the timing of uploading the file to the S3 bucket for temp and transfers it to the original public S3 bucket.

I think that it can be stored in S3 as if it were uploaded to a normal Web server, so if you need to migrate a site that uses SSI to S3 + CloudFront for site migration etc., it is common that is described in SSI. You can migrate files without having to replace them all at once. If you replace the common files that are originally managed by SSI with each file, the subsequent common files will be hard-coded, which will increase the operation man-hours and risk. To be honest, I don't want to do much because there is a risk of human error in batch replacement itself. Given that, I'm wondering if this function is quite convenient. However, the disadvantage is that it costs a fee because it uses two buckets, and it is a little difficult to understand, so it is necessary to make it known properly.

Nowadays, with the evolution of CI / CD and Docker, I think that there are fewer situations where you are worried about the above. There are not only such sites in the world, so I wonder if there is such a demand for it.

That's all from the field.

The key visual is an image that was hit by searching for "AWS" by Pakutaso. Minatomirai for some reason.

[PYTHON] [AWS] Do SSI-like things with S3 / Lambda

What I wanted to do

What i did

Constitution

Setting method

Public bucket

Bucket for temp

Each setting

Bucket for temp

Function code

Other settings

Environment variable

in conclusion