[PYTHON] Get the file name saved in AWS S3 (1000 or more)

Thing you want to do

I want to know all the file names saved under a certain folder in a certain bucket of AWS S3. At this time, there are more than 1000 files under a certain folder in a certain bucket.

important point Since list_object can only get up to 1000 items, it is necessary to devise a process to get file information.

Method

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html As described in.

code

sample.py


import boto3

MY_REGION = 'Region name';
MY_BUCKET = 'Bucket name';

#Directories under the bucket
TARGET_PATH = 'tmp/YYYY/MM/DD/';

client = boto3.client('s3', region_name=MY_REGION)
paginator = client.get_paginator('list_objects')

#Filtering settings
operation_parameters = {
    'Bucket': MY_BUCKET,
    'Prefix': TARGET_PATH
}

page_iterator = paginator.paginate(**operation_parameters)

#Output S3 object key
for page in page_iterator:
    for content in page['Contents']:
        print(content['Key'])

Output example of the above code

tmp/YYYY/MM/DD/0001.txt
tmp/YYYY/MM/DD/0002.txt
tmp/YYYY/MM/DD/0003.txt
....
tmp/YYYY/MM/DD/1000.txt
tmp/YYYY/MM/DD/1001.txt
tmp/YYYY/MM/DD/1002.txt

memo ・ I personally remember that it was difficult to get information on more than 1000 files, but recently I was a little impressed that it was easy to do, so I wrote this article. -Also, I didn't know that filtering such as under a specific folder could be specified, so memo memo.

Recommended Posts

Get the file name saved in AWS S3 (1000 or more)
Get the file name in a folder using glob
Get the host name in Python
Format the Git log and get the committed file name in csv format
I want to get the file name, line number, and function name in Python 3.4
How to get the variable name itself in python
Download the file in Python
Get the MIME type in Python and determine the file format
Get the class name where the method is defined in the decorator
Output the key list included in S3 Bucket to a file
Search the file name including the specified word and extension in the directory
Get the formula in an excel file as a string in Python
From the AWS cloud product page, put the AWS service name in csv
Save the binary file in Python
Get the desktop path in Python
Get the script path in Python
The story of the "hole" in the file
Get the desktop path in Python
Get the list in the S3 bucket with Python and search with a specific Key. Output the Key name, last update date, and count number to a file.
Get the file path using Pathlib
Get the query string (query string) in Django
After installing Anaconda3, I get the error "zsh: no such file or directory:/opt/anaconda3/bin/conda" in the command line preferences.
The file name was bad in Python and I was addicted to import
Get the client's IP address in Django
Get the top nth values in Pandas
Download the file from S3 using boto.
OR the List in Python (zip function)
[Python] Get the variable name with str
Get the path to the systemd unit file
Read the file line by line in Python
I saved the scraped data in CSV!
Read the file line by line in Python
I can't get the element in Selenium!
[Python] Get the character code of the file
Get the EDINET code list in Python
[Python] Read the specified line in the file
Get the hierarchy name using the OpenMaya iterator
How to get the notebook name you are currently using in Google Colab
Python / subprocess> Symbolic link Implementation to get only the destination file name> os.readlink ()
Handle CSV that contains the element you want to parse in the file name