[PYTHON] We built a mechanism to download a large zip file with basic authentication with aws

Article content in three lines

--Let's use EC2 and framework --Cannot be assembled without a server (can be assembled, but NG due to capacity limitation) --API Gateway Convenient (difficult)

Overview

When sharing a large file, if the mail exceeds about 8MB, it will be blocked because it is a mail server. Even if it is not blocked, sending a large file by e-mail may be quite annoying, such as putting a load on the server. So, I want to use the storage service, but if it is a Giga file flight, I am likely to get angry with the security agreement (I used it a lot in the past), and Google Drive is worried about sharing without a Google account (access if I do not know the URL) Although it is not possible, it is difficult for everyone to access it).

All I want is a URL that allows me to temporarily apply an ID & path and share a large file ... (Delete it after the end or make it private after the time limit)

So, I recently received the SAA from aws, and I challenged myself to create a mechanism for sharing files easily with aws for review, which was the reason for this article.

Personally, I didn't want to turn the server on / off or monitor it, so I wondered if I could build it without a server.

By the way, the size of the file you want to share is a zip file of about 100MB.

Team up with API Gateway, Lambda and S3

Each role of each service

API Gateway

To publish Lambda and to apply Basic authentication to the published URL. After checking, API Gateway has a method called "API Gateway Lambda Authorizer" that allows you to authenticate with an access key and to prepare Lambda and control API access using it.

Lambda

Prepare a Lambda function that applies Basic authentication to use "API Gateway Lambda Authorizer" and two pages for downloading a zip file after Basic authentication.

One is to display HTML and ask them to select the file to download. At first, I thought it would be okay to download if I stepped on the URL, but what if there are multiple files? So I decided to prepare an HTML page. (When I think about it now, I'm glad I used the query parameter. Yes ... → I thought, but it didn't work. See below.)

The other is a Lambda function that downloads a file from S3 and returns a zip file (binary).

S3

Use it as a place to store zip files you want to download and a place to store HTML templates.

Diagram

TODO: Make and paste a diagram

Implementation / setting

Lambda

Function name: download___auth

lambda to perform basic authentication

import json
import os
import base64


def lambda_handler(event, context):
	policy = {
        'principalId': 'user',
        'policyDocument': {
          'Version': '2012-10-17',
          'Statement': [
            {
              'Action': 'execute-api:Invoke',
              'Effect': 'Deny',
              'Resource': event['methodArn']
            }
          ]
        }
    }

    if not basic_auth(event):
        print('Auth Error!!!')
        return policy
        
    policy['policyDocument']['Statement'][0]['Effect'] = 'Allow'
    return policy
    

def basic_auth(event):
    if 'headers' in event.keys() and 'authorization' in event['headers'].keys():
        auth_header = event['headers']['authorization']
    
    	#Get information from lambda environment variables.
        user = os.environ['USER']
        password = os.environ['PASSWORD']
        print(os.environ)
        
        _b64 = base64.b64encode('{}:{}'.format(user, password).encode('utf-8'))
        auth_str = 'Basic {}'.format(_b64.decode('utf-8'))
        return auth_header == auth_str

    raise Exception('Auth Error!!!')

Function name: download___index

A lambda that displays downloadable files in HTML. HTML template is obtained from S3. The template engine is jinja.

from jinja2 import Template
import boto3
from botocore.exceptions import ClientError

import os
import logging


logger = logging.getLogger()
S3 = boto3.resource('s3')
TEMPLATE_AWS_S3_BUCKET_NAME = 'hogehoge-downloader'
BUCKET = S3.Bucket(TEMPLATE_AWS_S3_BUCKET_NAME)


def get_object(bucket, object_name):
    """Retrieve an object from an Amazon S3 bucket

    :param bucket_name: string
    :param object_name: string
    :return: botocore.response.StreamingBody object. If error, return None.
    """
    try:
        response = bucket.Object(object_name).get()
    except ClientError as e:
        # AllAccessDisabled error == bucket or object not found
        logging.error(e)
        return None
    # Return an open StreamingBody object
    return response['Body'].read()



def main():
    index_html = get_object(BUCKET,
                            os.path.join('template', 'index.html')) \
                            .decode('utf8')
    li_html = get_object(BUCKET,
                          os.path.join('template', 'file_li.html')) \
                          .decode('utf8')

    index_t = Template(index_html)
    insert_list = []
    objs = BUCKET.meta.client.list_objects_v2(Bucket=BUCKET.name,
                                              Prefix='files')
    for obj in objs.get('Contents'):
        k = obj.get('Key')
        ks = k.split('/')
        if ks[1] == '':
            continue

        file_name = ks[1]
        print(obj.get('Key'))
        li_t = Template(li_html)
        insert_list.append(li_t.render(
            file_url='#',
            file_name=file_name
        ))

    output_html = index_t.render(file_li=''.join(insert_list))
    return output_html


def lambda_handler(event, context):
    output_html = main()
    
    return {
        "statusCode": 200,
        "headers": {
            "Content-Type": 'text/html'
        },
        "isBase64Encoded": False,
        "body": output_html
    }

Function name: download___download

Download file from s3 lambda

import boto3
from botocore.exceptions import ClientError

import os
import logging
import base64

logger = logging.getLogger()
S3 = boto3.resource('s3')
TEMPLATE_AWS_S3_BUCKET_NAME = 'hogehoge-downloader'
TEMPLATE_BUCKET = S3.Bucket(TEMPLATE_AWS_S3_BUCKET_NAME)


def get_object(bucket, object_name):
    """Retrieve an object from an Amazon S3 bucket

    :param bucket_name: string
    :param object_name: string
    :return: botocore.response.StreamingBody object. If error, return None.
    """
    try:
        response = bucket.Object(object_name).get()
    except ClientError as e:
        # AllAccessDisabled error == bucket or object not found
        logging.error(e)
        return None
    # Return an open StreamingBody object
    return response['Body'].read()


def lambda_handler(event, context):
    file_name = event['queryStringParameters']['fileName']
    body = get_object(TEMPLATE_BUCKET, os.path.join('files', file_name))
    return {
        "statusCode": 200,
        "headers": {
            "Content-Disposition": 'attachment;filename="{}"'.format(file_name),
            "Content-Type": 'application/zip'
        },
        "isBase64Encoded": True,
        "body": base64.b64encode(body).decode('utf-8')
    }

API Gateway

Initial construction

--Select REST API as API type

Built like this.

image.png

Authorizer settings

Set to work with the lambda function that performs Basic authentication.

image.png

Set as follows. Be careful to uncheck the authorization cache. (If you leave it checked and set multiple resources, it will behave strangely)

スクリーンショット 2020-03-15 11.41.06.png

This completes the auth settings. The rest is OK if you set it with each method.

Creating resource methods

Prepare a resource method as shown below.

image.png

The detailed settings for each will be explained below.

/html

Check Use lambda proxy integration. By inserting this, header etc. can be defined on the lambda side.

image.png

Set method request to authenticate with Basic authentication.

image.png

Modified so that html can be recognized in the method response. Modify the content type of the response body to text / html.

image.png

This completes the setting.

/download

As with the HTML side, check Use lambda proxy integration.

image.png

Basic authentication is set here as well.

Also, since I want to receive the file name to download as a parameter, set the parameter named fileName in the URL query string parameter. (If you want to make it mandatory, check it)

image.png

Changed the content type of the method response because I want to download the zip file.

image.png

Finally, add the binary media type from the settings in the left menu. By setting the target content type here, the return value (base64) from lambda will be converted to binary. (However, you have to add Content-type to the request header or ʻapplication / zip` to Accept)

image.png

S3

bucket name: Create hogehoge-downloader and create the following directory.

--files: Store zip files --template: Prepare HTML template file

The HTML template is created below.

index.html

<!DOCTYPE html>
<html>
   <head>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1">
      <title>Downloader</title>
      <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/css/bulma.min.css">
      <script defer src="https://use.fontawesome.com/releases/v5.3.1/js/all.js"></script>
   </head>
   <body>
      <section class="section">
         <div class="container">
            <div class="container">
               <ul>{{ file_li }}</ul>
         </div>
         </div>
      </section>
   </body>
  <script src="https://unpkg.com/axios/dist/axios.min.js"></script>
  <script>
      function download(e) {
        const zipUrl = 'https://xxx.aws.com/prod/download?fileName=' + this.filename;
        const blob = axios.get(zipUrl, {
            responseType: 'blob',
            headers: {
                Accept: 'application/zip'
            },
        }).then(response => {
            window.URL = window.URL || window.webkitURL;
            const uri = window.URL.createObjectURL(response.data);
            const link = document.createElement('a');
            link.download = this.filename;
            link.href = uri;
            link.click()
        }).catch(error => {
            console.log(error);
        });
    }

    var links = Array.from(document.getElementsByClassName('downloadLink'));
    links.map(l => l.addEventListener('click', {filename: l.dataset.filename, handleEvent: download}));
  </script>
</html>

file_li.html

<li>
	<a href="{{ file_url }}" class="downloadLink" data-filename="{{ file_name }}">{{ file_name }}</a>
</li>

I was addicted to

--If you allow the cache of the authorizer, authentication k / NG will appear for each resource for some reason → It can be avoided by not caching --The binary media type in the settings is not filtered (?) In the response. Processed by Content-type and Accept in request header --When using content-disposition, prepare header with lambda & check Use Lambda proxy integration

problem

However, this configuration does not work. I didn't notice until I assembled. .. ..

I didn't notice until I actually assembled and confirmed the error.

Please check this page.

https://docs.aws.amazon.com/ja_jp/lambda/latest/dg/limits.html

Call payload (request and response) 6 MB (synchronous)

What! !! ?? ?? ?? !! !! 6MB? ?? ?? !! !! ?? ?? ??

Dead end.

~ Complete ~ </ FONT>

Build with EC2

Yes

Diagram

Only EC2. Prepared for Ubuntu 18.04.

Implementation / setting

I prepared the API of Bottle in the Python container prepared by Docker. The process of basic authentication and zip file download is all left to Bottle.

Download the zip file placed in the current directory.

app.py

import bottle


#BASIC authentication username and password
USERNAME = "user"
PASSWORD = "pass"


def check(username, password):
    u"""
Check BASIC authentication username and password
    @bottle.auth_basic(check)Apply in
    """
    return username == USERNAME and password == PASSWORD


@bottle.route("/zip")
@bottle.auth_basic(check)
def zip():
    zip_filename = 'files.zip'
    with open(zip_filename, 'rb') as f:
        body = f.read()

    response.content_type = 'application/zip'
    response.set_header('Content-Disposition', 'attachment; filename="{}"'.format(zip_filename))
    response.set_header('Content-Length', len(body))
    response.body = body
    return response


if __name__ == '__main__':
    bottle.run(host='0.0.0.0', port=80, debug=True)
$ docker run -p 80:80 -v $(pwd):/app -it docker-image-hogehoge python3 /app/app.py

Now you have a URL to download even with a large capacity (although it takes time)! !! !! !! !! !! !!

Miscellaneous impressions

For studying! It's serverless! I was enthusiastic about it, but I should have done it with EC2 from the beginning. .. .. .. .. .. ..

Recommended Posts

We built a mechanism to download a large zip file with basic authentication with aws
Create a large text file with shellscript
How to read a CSV file with Python 2/3
How to disguise a ZIP file as a PNG file
Save the object to a file with pickle
AWS Step Functions to learn with a sample
I want to write to a file with Python
Convert a text file with hexadecimal values to a binary file
I tried uploading / downloading a file to AWS S3 / Azure BlobStorage / GCP CloudStorage with Python
How to put a hyperlink to "file: // hogehoge" with sphinx-> pdf
I just built a virtual environment with AWS lambda layer