I want to know all the file names saved under a certain folder in a certain bucket of AWS S3. At this time, there are more than 1000 files under a certain folder in a certain bucket.
important point Since list_object can only get up to 1000 items, it is necessary to devise a process to get file information.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html As described in.
sample.py
import boto3
MY_REGION = 'Region name';
MY_BUCKET = 'Bucket name';
#Directories under the bucket
TARGET_PATH = 'tmp/YYYY/MM/DD/';
client = boto3.client('s3', region_name=MY_REGION)
paginator = client.get_paginator('list_objects')
#Filtering settings
operation_parameters = {
'Bucket': MY_BUCKET,
'Prefix': TARGET_PATH
}
page_iterator = paginator.paginate(**operation_parameters)
#Output S3 object key
for page in page_iterator:
for content in page['Contents']:
print(content['Key'])
Output example of the above code
tmp/YYYY/MM/DD/0001.txt
tmp/YYYY/MM/DD/0002.txt
tmp/YYYY/MM/DD/0003.txt
....
tmp/YYYY/MM/DD/1000.txt
tmp/YYYY/MM/DD/1001.txt
tmp/YYYY/MM/DD/1002.txt
memo ・ I personally remember that it was difficult to get information on more than 1000 files, but recently I was a little impressed that it was easy to do, so I wrote this article. -Also, I didn't know that filtering such as under a specific folder could be specified, so memo memo.
Recommended Posts