[PYTHON] Define boto3 Client API response in data class

Introduction

I was playing with boto3 for studying. I thought that it would be better to store the resource information obtained by using boto3 in the data class and refer to it (the reason will be described later), so I will write it as a knowledge.

What is boto3 !?

Briefly, it's an SDK for working with AWS resources in python. It is divided into low-level (client) API and high-level API (resorce). There are an infinite number of other related articles, so if you want to know more, please check them out.

This time we will use the low level API.

What is a data class !?

It provides a decorator that dynamically assigns special methods associated with classes such as \ _ \ _ init \ _ \ _ ().

A class with the following constructor

class Wanchan_Nekochan():
    def __init__(self, cat:str, dog:str):
     self.cat =cat
      self.dog = dog

You will be able to write smartly without such a constructor.

@dataclass
class Wanchan_Nekochan():
    cat: str
    dog: str

https://docs.python.org/ja/3/library/dataclasses.html

When compared with named tuples that have similar functions, there are the following differences. -Namedtuple becomes immutable (non-editable) after instantiation. dataclass is mutable (editable) by default, but if you pass the argument frozen as true Become immutable. ・ Data class reads data a little faster (verification required)

Exercise

This time, to get the bucket list of S3, use the list_buckets () method of S3.Client class. (Reference) Official reference https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_buckets

The definition of the response is the following dictionary type.

{
    'Buckets': [
        {
            'Name': 'string',
            'CreationDate': datetime
        },
    ],
    'Owner': {
        'DisplayName': 'string',
        'ID': 'string'
    },
    "ResponseMetadata": {
        "RequestId": str,
        "HTTPStatusCode": int,
        "HTTPHeaders": dict,
        "RetryAttempts": int
    }
}

When getting the information from the returned response, the key will be specified as a character string as shown below.

s3_client = session.client('s3')
data=s3_client.list_buckets()

status_code = data["ResponseMetadata"]["HTTPStatusCode"]
display_name = data["Owner"]["DisplayName"]

Since the key is specified by a character string, there is a risk of spelling mistakes because the input completion function by the IDE cannot be used, and there are troublesome problems such as not knowing until you refer to what type the data has.

If you define the response definition as dataclass in advance, dot access is possible, so input completion becomes possible and type hints by mypy can be used rigorously.

There is no reason not to use dataclass !!!

So ...

Source for the time being

from dataclasses import dataclass
from typing import List
from datetime import datetime

import boto3
from dacite import from_dict


session = boto3.session.Session(profile_name='s3_test')
#BasesClient is in global scope to avoid throwing APIs multiple times
#Implement with a singleton
s3_client = session.client('s3')

@dataclass
class Boto3_Response():
    RequestId: str
    HostId: str
    HTTPStatusCode: int
    HTTPHeaders: Dict
    RetryAttempts: int

@dataclass
class Inner_Owner():
    DisplayName: str
    ID: str

@dataclass
class Inner_Buckets():
    Name: str
    CreationDate: datetime

@dataclass
class S3_LIST():
    ResponseMetadata: Boto3_Response
    Owner: Inner_Owner
    Buckets: List[Inner_Buckets]

    @classmethod
    def make_s3_name_list(cls):
        return from_dict(data_class=cls, data=s3_client.list_buckets())


s3_list_response = S3_LIST.make_s3_name_list()

#Status code
print(s3_list_response.ResponseMetadata.HTTPStatusCode) #200
#Owner name
print(s3_list_response.Owner.DisplayName) # nikujaga-kun
# ID
print(s3_list_response.Owner.ID) 
#Bucket list
print(*[bucket.Name for bucket in s3_list_response.Buckets])

Explanation below

from dacite import from_dict

I'm importing a third party library called dacite here. dacite (I read it as "de") is simply a library for passing dictionary types to nested data classes and instantiating them. https://pypi.org/project/dacite/

@dataclass
class Boto3_Response():
    RequestId: str
    HostId: str
    HTTPStatusCode: int
    HTTPHeaders: Dict
    RetryAttempts: int

@dataclass
class Inner_Owner():
    DisplayName: str
    ID: str

@dataclass
class Inner_Buckets():
    Name: str
    CreationDate: datetime

@dataclass
class S3_LIST():
    ResponseMetadata: Boto3_Response
    Owner: Inner_Owner
    Buckets: List[Inner_Buckets]

    @classmethod
    def make_s3_name_list(cls):
        return from_dict(data_class=cls, data=s3_client.list_buckets())

In the definition of S3_LIST above, different data classes Boto3_Response, Inner_Owner, Inner_Buckets are defined as attribute types. If you pass the response of list_buckets to S3_LIST as it is without using from_dict of dacite

@classmethod
   def make_s3_name_list(cls):
      return cls(**s3_client.list_buckets())

It looks like this, but since it is passed as a dictionary type instead of a class instance, it gets angry if there is no such attribute.

s3_list_response = S3_LIST.make_s3_name_list()

#Get angry with Attribute Error here
print(s3_list_response.ResponseMetadata.HTTPStatusCode)

# 'dict' object has no attribute 'HTTPStatusCode'

If you try to do something good here just by incorporating it, it will be difficult as it is, so I'm using dacite as a library that does a good job. You should ride on the shoulders of giants infinitely.

Finally

data class is good

Recommended Posts

Define boto3 Client API response in data class
Get Google Fit API data in Python
Get Youtube data in Python using Youtube Data API
SELECT data using client library in BigQuery
Put AWS data in Google Spreadsheet with boto + gspread
[Road to Python Intermediate] Define __getattr__ function in class