Quickly implement S3 compatible storage with python-flask

what's this

This is an article about creating S3 compatible storage with python. The meaning of studying is strong.

Explore client movements

Having read the official AWS API docs quickly, I decided to find out what the S3 client actually did.

Prerequisites

--Use python & flask for dummy server APP --Use cyberduck for the client (because it was at hand and seemed easy to use) --Use HTTPS (because the standard port when using s3 with cyberduck was 443) --Assuming access in path format instead of virtual host format access

operation

Object Listing --Prefix specified in GET query, delimiter related settings --The root of the bucket is hit

127.0.0.1 - - [29/Jan/2017 01:02:24] "GET /test-bucket/?max-keys=1000&prefix&delimiter=%2F HTTP/1.1"

[Request header]
Authorization: AWS hogehoge_user1:ysH68SkszwcudzrtZAlxlV9z8WA=
Content-Length:
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Host: b.tgr.tokyo
X-Amz-Request-Payer: requester
Date: Sun, 29 Jan 2017 01:02:23 GMT
Content-Type:

Upload --Use PUT method (not POST) --The path of the object you want to upload appears in the HTTP path. --Content-Length is sent

127.0.0.1 - - [29/Jan/2017 01:04:19] "PUT /test-bucket/gopher.png HTTP/1.1"

[Request header]
Authorization: AWS hogehoge_user1:suTrxv+XQuecbq7vUMYoQ3rWBcM=
Content-Length: 114063
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Host: b.tgr.tokyo
Date: Sun, 29 Jan 2017 01:04:18 GMT
Content-Type: image/png

Download --Use GET method --The path of the object you want to drop appears in the HTTP path

127.0.0.1 - - [29/Jan/2017 01:05:09] "GET /test-bucket/gopher.png HTTP/1.1"

[Request header]
Authorization: AWS hogehoge_user1:rhqMjHlbcYg/APr7bv9PH7tbyy4=
Content-Length:
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Host: b.tgr.tokyo
X-Amz-Request-Payer: requester
Date: Sun, 29 Jan 2017 01:05:09 GMT
Content-Type:

Delete --Use DELETE method --The path of the object you want to delete appears in the HTTP path

127.0.0.1 - - [29/Jan/2017 01:05:59] "DELETE /test-bucket/gopher.png HTTP/1.1"

[Request header]
Authorization: AWS hogehoge_user1:U2NEDsKLvJ08mLYdPIB43R+IAu0=
Content-Length:
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Date: Sun, 29 Jan 2017 01:05:59 GMT
Host: b.tgr.tokyo
Content-Type:

Impressions

It's pretty refreshing, so if you write an APP server with the following functions It seems that you can make S3 that can be accessed from Cyberduck.

--Check the Authorization header and authorize (common to all requests) --When a request (GET) that hits the root of the bucket comes, the objects are listed and returned. --When a request (GET) to hit the object directly comes, return the object directly --When a file arrives using the PUT method, write the HTTP payload to the file --Delete the file when the object is hit with DELETE

Implement

Although it is a completed version, I pushed it to simple-s3-clone. Please see here for the detailed code. There is no concept of a bucket and error handling is still lax, so I'll add it later.

The code that appears in the explanation below has been removed for the sake of explanation. I hope that the concept alone can be conveyed.

Authentication part

This is where I had the most trouble. Basically, I authenticate by the following procedure, but I forgot the existence of the X-Amz header, and I was frustrated because it did not match the signature generated by the client forever.

--Create a string for signature generation (this string can be generated from the request method, path, and header information) --Use this string and secret access key to hash with HMAC-SHA1 and generate a signature --AccessKeyId ・ If the signature generated on the server side matches the one sent by the user, it is OK (if it does not match, 403 is returned)


def get_x_amz_headers():
    return filter(lambda x: x[0].startswith('X-Amz-'), request.headers.items())

def generate_x_amz_string():
    ret = ''
    #X to generate a string for authentication-Sort and concatenate Amz headers
    for key in sorted(get_x_amz_headers()):
        k = key[0].lower()
        v = request.headers.get(key[0])
        ret += '{}:{}\n'.format(k, v)
    return ret


def generate_auth_string():
    s = '{}\n{}\n{}\n{}\n{}{}'.format(
        request.method,
        request.headers.get('Content-Md5', ''),
        request.headers.get('Content-Type', ''),
        request.headers.get('Date', ''),
        generate_x_amz_string(),
        request.path
    )
    return s

def auth_check(auth_raw_string):
    auth_info = request.headers.get('Authorization')
    access_key_id = 'hogehoge_key_id'
    secret_access_key = 'hogehoge_secret'
    # HMAC-SHA-Use 1 to hash the generated string with a secret key(Signature generation)
    hashed = hmac.new(secret_access_key, auth_raw_string,
                      hashlib.sha1).digest()
    #The form of the authentication header sent by the user(AWS AccessKeyID:Signature)make
    generated_signature = 'AWS {}:{}'.format(
        access_key_id, base64.encodestring(hashed).rstrip())
    #Compare with user-generated signature
    if auth_info != generated_signature:
        raise exception.SignatureDoesNotMatch()


@app.before_request
def before_request():
    #Authenticate before every request
    s = generate_auth_string()
    auth_check(s)

Object listing and download

@app.route("/<path:path>")
def get_request_with_path(path):
    if g.resource_path == '':
        return process_object_list()
    else:
        return download_object()

Object upload, folder creation

@app.route("/<path:path>", methods=['PUT'])
def put_request_with_path(path):
    if int(request.headers.get('Content-Length')) != len(request.data):
        raise exception.MissingContentLength()
    if g.resource_path[-1] == '/':
        return create_prefix()
    else:
        return create_object()

Delete object, delete folder


@app.route("/<path:path>", methods=['DELETE'])
def delete_request_with_path(path):
    if g.resource_path[-1] == '/':
        return delete_prefix()
    else:
        return delete_object()

It seems to be moving

simple-s3-clone.gif

s3cmd didn't work because the way it sends requests is different from cyberduck. However, if you modify the authentication part, it seems that you can respond immediately.

Finally

Until now, I have been doing scientific and technological calculations such as simulation, but after joining the company, I learned about services like S3. The motivation for this time is that I wanted to implement the server side.

Reference site

Recommended Posts

Quickly implement S3 compatible storage with python-flask
Quickly visualize with Pandas
Implement FReLU with tf.keras
S3 uploader with boto