This is an article about creating S3 compatible storage with python. The meaning of studying is strong.
Having read the official AWS API docs quickly, I decided to find out what the S3 client actually did.
--Use python & flask for dummy server APP --Use cyberduck for the client (because it was at hand and seemed easy to use) --Use HTTPS (because the standard port when using s3 with cyberduck was 443) --Assuming access in path format instead of virtual host format access
Object Listing --Prefix specified in GET query, delimiter related settings --The root of the bucket is hit
127.0.0.1 - - [29/Jan/2017 01:02:24] "GET /test-bucket/?max-keys=1000&prefix&delimiter=%2F HTTP/1.1"
[Request header]
Authorization: AWS hogehoge_user1:ysH68SkszwcudzrtZAlxlV9z8WA=
Content-Length:
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Host: b.tgr.tokyo
X-Amz-Request-Payer: requester
Date: Sun, 29 Jan 2017 01:02:23 GMT
Content-Type:
Upload --Use PUT method (not POST) --The path of the object you want to upload appears in the HTTP path. --Content-Length is sent
127.0.0.1 - - [29/Jan/2017 01:04:19] "PUT /test-bucket/gopher.png HTTP/1.1"
[Request header]
Authorization: AWS hogehoge_user1:suTrxv+XQuecbq7vUMYoQ3rWBcM=
Content-Length: 114063
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Host: b.tgr.tokyo
Date: Sun, 29 Jan 2017 01:04:18 GMT
Content-Type: image/png
Download --Use GET method --The path of the object you want to drop appears in the HTTP path
127.0.0.1 - - [29/Jan/2017 01:05:09] "GET /test-bucket/gopher.png HTTP/1.1"
[Request header]
Authorization: AWS hogehoge_user1:rhqMjHlbcYg/APr7bv9PH7tbyy4=
Content-Length:
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Host: b.tgr.tokyo
X-Amz-Request-Payer: requester
Date: Sun, 29 Jan 2017 01:05:09 GMT
Content-Type:
Delete --Use DELETE method --The path of the object you want to delete appears in the HTTP path
127.0.0.1 - - [29/Jan/2017 01:05:59] "DELETE /test-bucket/gopher.png HTTP/1.1"
[Request header]
Authorization: AWS hogehoge_user1:U2NEDsKLvJ08mLYdPIB43R+IAu0=
Content-Length:
User-Agent: Cyberduck/4.7.2.18004 (Mac OS X/10.10.5) (x86_64)
Connection: upgrade
Date: Sun, 29 Jan 2017 01:05:59 GMT
Host: b.tgr.tokyo
Content-Type:
It's pretty refreshing, so if you write an APP server with the following functions It seems that you can make S3 that can be accessed from Cyberduck.
--Check the Authorization header and authorize (common to all requests) --When a request (GET) that hits the root of the bucket comes, the objects are listed and returned. --When a request (GET) to hit the object directly comes, return the object directly --When a file arrives using the PUT method, write the HTTP payload to the file --Delete the file when the object is hit with DELETE
Although it is a completed version, I pushed it to simple-s3-clone. Please see here for the detailed code. There is no concept of a bucket and error handling is still lax, so I'll add it later.
The code that appears in the explanation below has been removed for the sake of explanation. I hope that the concept alone can be conveyed.
This is where I had the most trouble. Basically, I authenticate by the following procedure, but I forgot the existence of the X-Amz header, and I was frustrated because it did not match the signature generated by the client forever.
--Create a string for signature generation (this string can be generated from the request method, path, and header information) --Use this string and secret access key to hash with HMAC-SHA1 and generate a signature --AccessKeyId ・ If the signature generated on the server side matches the one sent by the user, it is OK (if it does not match, 403 is returned)
def get_x_amz_headers():
return filter(lambda x: x[0].startswith('X-Amz-'), request.headers.items())
def generate_x_amz_string():
ret = ''
#X to generate a string for authentication-Sort and concatenate Amz headers
for key in sorted(get_x_amz_headers()):
k = key[0].lower()
v = request.headers.get(key[0])
ret += '{}:{}\n'.format(k, v)
return ret
def generate_auth_string():
s = '{}\n{}\n{}\n{}\n{}{}'.format(
request.method,
request.headers.get('Content-Md5', ''),
request.headers.get('Content-Type', ''),
request.headers.get('Date', ''),
generate_x_amz_string(),
request.path
)
return s
def auth_check(auth_raw_string):
auth_info = request.headers.get('Authorization')
access_key_id = 'hogehoge_key_id'
secret_access_key = 'hogehoge_secret'
# HMAC-SHA-Use 1 to hash the generated string with a secret key(Signature generation)
hashed = hmac.new(secret_access_key, auth_raw_string,
hashlib.sha1).digest()
#The form of the authentication header sent by the user(AWS AccessKeyID:Signature)make
generated_signature = 'AWS {}:{}'.format(
access_key_id, base64.encodestring(hashed).rstrip())
#Compare with user-generated signature
if auth_info != generated_signature:
raise exception.SignatureDoesNotMatch()
@app.before_request
def before_request():
#Authenticate before every request
s = generate_auth_string()
auth_check(s)
@app.route("/<path:path>")
def get_request_with_path(path):
if g.resource_path == '':
return process_object_list()
else:
return download_object()
@app.route("/<path:path>", methods=['PUT'])
def put_request_with_path(path):
if int(request.headers.get('Content-Length')) != len(request.data):
raise exception.MissingContentLength()
if g.resource_path[-1] == '/':
return create_prefix()
else:
return create_object()
@app.route("/<path:path>", methods=['DELETE'])
def delete_request_with_path(path):
if g.resource_path[-1] == '/':
return delete_prefix()
else:
return delete_object()
s3cmd didn't work because the way it sends requests is different from cyberduck. However, if you modify the authentication part, it seems that you can respond immediately.
Until now, I have been doing scientific and technological calculations such as simulation, but after joining the company, I learned about services like S3. The motivation for this time is that I wanted to implement the server side.