This post is a personal opinion / memo and does not represent the company I belong to.
DynamoDB has a function called ** Conditional Update **, which allows you to perform atomic update operations.
For example, it looks like the following. --Put if there is no item with a specific key --Update if there is a specific attribute of a specific item
For more information, check out the DynamoDB Developer Guide (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html).
By applying this ** "Put if there is no item with a specific key" ** and reading the presence or absence of the item as follows, lock management can be implemented.
Go to DynamoDB with the key A to put the item. At this time, the item with key A --Exists: Someone else has already locked it. Put operation fails (makes it). --Doesn't exist: No one is locked. Put the item with key A as a marking for the lock.
If you implement it with Python and boto, it looks like this.
lock_key
def lock_key(key):
try:
dynamodb.put_item(
'TABLE_NAME',
{'key' : { "S" : key }},
expected = {
'key' : { "Exists" : False }
}
)
return True
except Exception,e:
return False
Use the file name as a key to put the item to the table. Returns True if the Put succeeds without an existing item, False otherwise.
I wrote a tool to manage file uploads to S3 using this mechanism. https://gist.github.com/imaifactory/6132f8a60461584b4613
Since there is no need to worry about files that have been uploaded once being uploaded again, you can upload a large number of files in parallel, and even if the process fails in the middle, you can start over again. In fact, I've been using this mechanism to transfer about 8,000 to 10,000 log files to S3 a day for 3 to 4 months, but I was able to operate with zero omissions and duplication.
Awscli's s3 sync is convenient, but if you are worried about leakage / duplication measures to use it for production, you may want to implement it like this.
Idempotent! Idempotent!
Recommended Posts