[PYTHON] Apply IAM roles for service account to s3cmd

Introduction

There is s3cmd as a tool for managing S3 objects. You can operate S3 without installing the AWS CLI, and it is often used for backup and restore.

To use s3cmd on EKS, Pod needs access to S3. In the past, in order to give access to S3, ** IAM Role was given to Node **, and ** kube2iam was used to temporarily obtain credentials **. In 2019, IAM role for service account (IRSA) will be released for each language. SDK supports it, but s3cmd does not use SDK, so I implemented the mechanism myself.

environment

macOS Mojabe 10.14.6 Pulumi 2.1.0 AWS CLI 1.16.292 EKS 1.15 s3cmd 2.1.0

s3cmd repair

Modify the source code of s3cmd and push the Docker image to ECR.

Download s3cmd with the following command.

$ wget --no-check-certificate https://github.com/s3tools/s3cmd/releases/download/v2.1.0/s3cmd-2.1.0.tar.gz
$ tar xzvf s3cmd-2.1.0.tar.gz
$ cd s3cmd-2.1.0

The directory structure of s3cmd-2.1.0 is as follows.

├── INSTALL.md
├── LICENSE
├── MANIFEST.in
├── NEWS
├── PKG-INFO
├── README.md
├── S3/
├── s3cmd
├── s3cmd.1
├── s3cmd.egg-info/
├── setup.cfg
└── setup.py

Code fix

Only modify S3 / Config.py. The flow to obtain S3 access authority is as follows.

  1. Get the values of ʻAWS_ROLE_ARN and ʻAWS_WEB_IDENTITY_TOKEN_FILE from the environment variables.
  2. Set the URL parameter and POST to the AWS STS API server.
  3. Parse the response body and get the access key, secret access key and session token.
  4. Set the parameter obtained in step 3 to the setting value of s3cmd.

Only the additional part is described below. Only the function role_config will rewrite the existing one.

S3/Config.py


import urllib.request
import urllib.parse
import xml.etree.cElementTree

def _get_url():
  stsUrl = "https://sts.amazonaws.com/"
  roleArn = os.environ.get('AWS_ROLE_ARN')
  path = os.environ.get('AWS_WEB_IDENTITY_TOKEN_FILE')
  with open(path) as f:
    webIdentityToken = f.read()
  params = { 
    "Action": "AssumeRoleWithWebIdentity",
    "Version": "2011-06-15",
    "RoleArn": roleArn,
    "RoleSessionName": "s3cmd",
    "WebIdentityToken": webIdentityToken
  }
  url = '{}?{}'.format(stsUrl, urllib.parse.urlencode(params))
  return url

def _build_name_to_xml_node(parent_node):
  if isinstance(parent_node, list):
    return build_name_to_xml_node(parent_node[0])
  xml_dict = {}
  for item in parent_node:
    key = re.compile('{.*}').sub('',item.tag)
    if key in xml_dict:
      if isinstance(xml_dict[key], list):
        xml_dict[key].append(item)
      else:
        xml_dict[key] = [xml_dict[key], item]
    else:
      xml_dict[key] = item
  return xml_dict

def _replace_nodes(parsed):
  for key, value in parsed.items():
    if list(value):
      sub_dict = _build_name_to_xml_node(value)
      parsed[key] = _replace_nodes(sub_dict)
    else:
      parsed[key] = value.text
  return parsed

def _parse_xml_to_dict(body):
  parser = xml.etree.cElementTree.XMLParser(target=xml.etree.cElementTree.TreeBuilder(), encoding='utf-8')
  parser.feed(body)
  root = parser.close()
  parsed = _build_name_to_xml_node(root)
  _replace_nodes(parsed)
  return parsed

class Config(object):
  def role_config(self):
    url = _get_url()
    req = urllib.request.Request(url, method='POST')
    with urllib.request.urlopen(req) as resp:
      body = resp.read()
    parsed = _parse_xml_to_dict(body)

    Config().update_option('access_key', parsed['AssumeRoleWithWebIdentityResult']['Credentials']['AccessKeyId'])
    Config().update_option('secret_key', parsed['AssumeRoleWithWebIdentityResult']['Credentials']['SecretAccessKey'])
    Config().update_option('access_token', parsed['AssumeRoleWithWebIdentityResult']['Credentials']['SessionToken'])

Let's look at each one.
The function _get_url is for creating the URL for POSTing to the STS API. Applying IRSA to a pod creates the environment variables ʻAWS_ROLE_ARN and ʻAWS_WEB_IDENTITY_TOKEN_FILE. The latter is a file path, which gets the token inside and adds it to the URL parameter.

def _get_url():
  stsUrl = "https://sts.amazonaws.com/"
  roleArn = os.environ.get('AWS_ROLE_ARN')
  path = os.environ.get('AWS_WEB_IDENTITY_TOKEN_FILE')
  with open(path) as f:
    webIdentityToken = f.read()
  params = { 
    "Action": "AssumeRoleWithWebIdentity",
    "Version": "2011-06-15",
    "RoleArn": roleArn,
    "RoleSessionName": "s3cmd",
    "WebIdentityToken": webIdentityToken
  }
  url = '{}?{}'.format(stsUrl, urllib.parse.urlencode(params))
  return url

The functions `_build_name_to_xml_node` and` _replace_nodes` are the processing part to convert xml to dictionary.
def _build_name_to_xml_node(parent_node):
  if isinstance(parent_node, list):
    return build_name_to_xml_node(parent_node[0])
  xml_dict = {}
  for item in parent_node:
    key = re.compile('{.*}').sub('',item.tag)
    if key in xml_dict:
      if isinstance(xml_dict[key], list):
        xml_dict[key].append(item)
      else:
        xml_dict[key] = [xml_dict[key], item]
    else:
      xml_dict[key] = item
  return xml_dict

def _replace_nodes(parsed):
  for key, value in parsed.items():
    if list(value):
      sub_dict = _build_name_to_xml_node(value)
      parsed[key] = _replace_nodes(sub_dict)
    else:
      parsed[key] = value.text
  return parsed

The function `_parse_xml_to_dict` is for parsing the xml returned from the STS API server and converting it to a dictionary with the above function.
def _parse_xml_to_dict(body):
  parser = xml.etree.cElementTree.XMLParser(target=xml.etree.cElementTree.TreeBuilder(), encoding='utf-8')
  parser.feed(body)
  root = parser.close()
  parsed = _build_name_to_xml_node(root)
  _replace_nodes(parsed)
  return parsed

The function `role_config` is used to assign an IAM Role to s3cmd. Set the access key, secret access key, and session token from the dictionary.
class Config(object):
  def role_config(self):
    url = _get_url()
    req = urllib.request.Request(url, method='POST')
    with urllib.request.urlopen(req) as resp:
      body = resp.read()
    parsed = _parse_xml_to_dict(body)

    Config().update_option('access_key', parsed['AssumeRoleWithWebIdentityResult']['Credentials']['AccessKeyId'])
    Config().update_option('secret_key', parsed['AssumeRoleWithWebIdentityResult']['Credentials']['SecretAccessKey'])
    Config().update_option('access_token', parsed['AssumeRoleWithWebIdentityResult']['Credentials']['SessionToken'])

Docker image creation

Compress the code-modified version as s3cmd-2.1.0.tar.gz and place it in the same directory as Dockerfile.

├── Dockerfile
└── s3cmd-2.1.0.tar.gz

The Dockerfile looks like this:

Dockerfile


FROM python:3.8.2-alpine3.11
ARG VERSION=2.1.0
COPY s3cmd-${VERSION}.tar.gz /tmp/
RUN tar -zxf /tmp/s3cmd-${VERSION}.tar.gz -C /tmp && \
    cd /tmp/s3cmd-${VERSION} && \
    python setup.py install && \
    mv s3cmd S3 /usr/local/bin && \
    rm -rf /tmp/*
ENTRYPOINT ["s3cmd"]
CMD ["--help"]

Build the image and push it to ECR. Replace XXXXXXXXXXXX with your AWS account.

$ docker build -t XXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/s3cmd:2.1.0 .
$ docker push XXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/s3cmd:2.1.0

Deploy

All the environment this time will be built with Pulumi. The directory structure is as follows. Only edit ʻindex.ts and k8s / s3cmd.yaml`.

├── Pulumi.dev.yaml
├── Pulumi.yaml
├── index.ts *
├── k8s
│   └── s3cmd.yaml *
├── node_modules/
├── package-lock.json
├── package.json
├── stack.json
└── tsconfig.json

Describe other than the Kubernetes manifest file in ʻindex.ts`. The EKS cluster must include the OpenID Connect Provider settings.

index.ts


import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";
import * as eks from "@pulumi/eks";
import * as k8s from "@pulumi/kubernetes";
import * as pulumi from "@pulumi/pulumi";


const vpc = new awsx.ec2.Vpc("custom", {
  cidrBlock: "10.0.0.0/16",
  numberOfAvailabilityZones: 3,
});

const cluster = new eks.Cluster("pulumi-eks-cluster", {
  vpcId: vpc.id,
  subnetIds: vpc.publicSubnetIds,
  deployDashboard: false,
  createOidcProvider: true,
  instanceType: aws.ec2.T3InstanceSmall,
});

const s3PolicyDocument = pulumi.all([cluster.core.oidcProvider?.arn, cluster.core.oidcProvider?.url]).apply(([arn, url]) => {
  return aws.iam.getPolicyDocument({
    statements: [{
      effect: "Allow",
      principals: [
        {
          type: "Federated",
          identifiers: [arn]
        },
      ],
      actions: ["sts:AssumeRoleWithWebIdentity"],
      conditions: [
        {
          test: "StringEquals",
          variable: url.replace('http://', '') + ":sub",
          values: [
            "system:serviceaccount:default:s3-full-access"
          ]
        },
      ],
    }]
  })
})

const s3FullAccessRole = new aws.iam.Role("s3FullAccessRole", {
  name: "s3-full-access-role",
  assumeRolePolicy: s3PolicyDocument.json,
})

new aws.s3.Bucket("pulumi-s3cmd-test", {
  bucket: "pulumi-s3cmd-test"
});

const s3FullAccessRoleAttachment = new aws.iam.RolePolicyAttachment("s3FullAccessRoleAttachment", {
  role: s3FullAccessRole,
  policyArn: aws.iam.AmazonS3FullAccess,
})

const myk8s = new k8s.Provider("myk8s", {
  kubeconfig: cluster.kubeconfig.apply(JSON.stringify),
});

const s3cmd = new k8s.yaml.ConfigFile("s3cmd", {
  file: "./k8s/s3cmd.yaml"
}, { provider: myk8s })

k8s / s3cmd.yaml defines ServiceAccount and Deployment. Service Account must add annotations.

s3cmd.yaml


apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: default
  name: s3-full-access
  labels:
    app: s3cmd
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/s3-full-access-role
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: s3cmd
  labels:
    app: s3cmd
spec:
  selector:
    matchLabels:
      app: s3cmd
  replicas: 1
  template:
    metadata:
      labels:
        app: s3cmd
    spec:
      serviceAccountName: s3-full-access
      containers:
      - image: XXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/s3cmd:2.1.0
        name: s3cmd
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo hello; sleep 10; done"]

All you have to do is deploy with the following command.

$ pulumi up

Verification

Confirm that you can type the s3cmd command from the created s3cmd pod. The S3 bucket created this time is displayed properly.

$ kubectl get pod
NAME                                                              READY   STATUS    RESTARTS   AGE
s3cmd-98985855f-h5lgl                                             1/1     Running   0          63s

$ kubectl exec -it s3cmd-98985855f-h5lgl -- s3cmd ls
2020-05-02 15:04  s3://pulumi-s3cmd-test

in conclusion

I confirmed that IAM Role can be assigned to s3cmd Pod by IRSA without using kube2iam. Considering that it is necessary to deploy DaemonSet with kube2iam and management resources will increase, I think that the merit of IRSA is great.

Recommended Posts

Apply IAM roles for service account to s3cmd
How to use Service Account OAuth and API with Google API Client for python
Command memo to apply load for performance verification