[PYTHON] Delete 1000 objects stored in AWS S3 at a time.

Thing you want to do

I want to be able to delete a file by reading the file containing the key name of the object to be deleted and executing it with a click.

policy

-Prepare a file in which the key of the object to be deleted is written. -Use the S3 API (delete_objects) to specify and delete the key while looping.

Reference link ・ About deleting S3 -About python's delete_objects API

What i did

-Create a key list of objects to be deleted.

Key list of objects to delete


hoge/fuga/hoge.jpg
hoge/fuga/hoge.png
hoge/hoge/hoge.gif
hoge/hoge/fuga.png
・ ・ ・
fuga/fuga/fuga.png

-Run the delete program Since delete_objects can specify only 1000 items at a time, the read key list is subdivided into 1000 items. Hit the DELETE API in the divided units to execute the loop.

Remove program.py



import boto3
import re
import json

MY_REGION = 'Region name';
MY_BUCKET = 'Bucket name';

client = boto3.client('s3', region_name=MY_REGION)
request_list = []
img_path_list = ''

def split_list(l):
    for idx in range(0, len(l), 1000):
        yield l[idx:idx + 1000]

#Read deleted data
with open('Key list of objects to delete.text') as f:
    img_path_list = f.readlines()

#Remove the line break at the end of the data and add it to the list
for path in img_path_list:
    path = path.replace('\n','')
    request_list.append({'Key': path})

#Divide the list into 1000 items
# devide_list = [[0,...,999],[1000,...,1999],...,[n,...,n+999]]
devide_list = list(split_list(request_list))

#Run the DELETE API
for key_list in devide_list:
    response = client.delete_objects(
        Bucket = MY_BUCKET,
        Delete = {
            'Objects': key_list
        }
    )

    #Record the deletion result
    with open('log/Deletion result.txt', mode='a') as f:
        for res in response['Deleted']:
            f.write(json.dumps(res))
            f.write('\n')

-If the deletion is successful, the result will be as follows.

Deletion result.text


{"Key": "hoge/fuga/hoge.jpg ", "DeleteMarker": true, "DeleteMarkerVersionId": "hogehoge1"}
{"Key": "hoge/fuga/hoge.png ", "DeleteMarker": true, "DeleteMarkerVersionId": "hogehoge2"}
{"Key": "hoge/hoge/hoge.gif", "DeleteMarker": true, "DeleteMarkerVersionId": "hogehoge3"}
{"Key": "hoge/hoge/fuga.png ", "DeleteMarker": true, "DeleteMarkerVersionId": "hogehoge4"}
{"Key": "fuga/fuga/fuga.png ", "DeleteMarker": true, "DeleteMarkerVersionId": "hogehoge5"}

Other

Prepare a list to delete and a list to not delete, and check if there is a key that exists in both lists. -Get elements that are common to multiple lists in Python and their numbers

Show the elements that are in the two lists.py


input_urls = ''
not_delete_urls = ''

#Read list data
with open('Input list.txt') as f:
    input_urls = f.readlines();

with open('List not to delete.txt') as f:
    not_delete_urls = f.readlines();

duplicate_urls = set(input_urls) & set(not_delete_urls)

#Change from set type to list type
list_duplicate_urls = list(duplicate_urls)
list_duplicate_urls.sort()

#Display the number of common elements and elements
print(len(list_duplicate_urls))
for elem in list_duplicate_urls:
    print(elem, end='')

Precautions

Recommended Posts

Delete 1000 objects stored in AWS S3 at a time.
I wanted to delete multiple objects in s3 with boto3
I stopped my instance at a specific time using AWS Lambda
Get a datetime instance at any time of the day in Python
I made a new AWS S3 bucket
Turn multiple lists with a for statement at the same time in Python
Build a LAMP environment in a very short time
Receives and processes n objects in a list
Delete data in a pattern with Redis Cluster
A clever way to time processing in Python
Loop variables at the same time in the template