[PYTHON] Throw costly instance health in Slack with Lambda

Why you want to do that

During development, you may launch high-level instances in various tests, but on rare occasions, you may inadvertently forget to drop them and return. Well, if you notice it the next day, the damage will be small. But what if you didn't notice it for a while, such as before the holidays? It cannot be said that a decent amount of money will be accumulated and in some cases it will not be a mess to write a disposition.

AWS has a mechanism to throw an alert with the billed amount, but that is not a preventive measure, just knowing that it has already accumulated to the point where it is dangerous. It's a story of what to do if you spend a month's budget in 3 days.

In the first place, you should check it before you go home, but I'm talking about whether you can protect it 100% by making rules and making checklists. Isn't there a case where the confirmed region is different? A creature called a programmer writes a program to let a machine do something that he doesn't want to do, right? Yeah, let's do it.

AWS is AWS

I know that awscli can take the state of an instance. So where do you do this? You can cron it on your company's server, but you can do it on AWS anyway, Lambda once or twice a day.

Awscli with Lambda

For the know-how for awscli with Lambda, see Run AWS CLI on Lambda and S3 Sync. I will refer to it.

Let's assume that there is a python3 environment for the time being.

> mkdir check-aws
> cd check-aws
> pip install awscli -t .

The aws file is just that

#!/usr/bin/env python3

import sys
import awscli.clidriver
def main():
    return awscli.clidriver.main()
if __name__ == '__main__':
    result = main()
    sys.exit(result)

x Try it on

> chmot +x aws
> ./aws
<Abbreviation>

Next, in lambda_function.py, write a process to throw to Slack when a working instance of large or larger is found with ec2 and rds.

# -*- coding: utf-8 -*-

import subprocess
import json
import urllib.request

region_name = {
    'ap-northeast-1': 'Tokyo',
    'ap-northeast-2': 'Soul',
    'ap-southeast-1': 'Singapore'
}

check_result = []

def check_ec2(region):
    cmd = []
    cmd.append("./aws")
    cmd.append("ec2")
    cmd.append("describe-instances")
    cmd.append("--filter")
    cmd.append("Name=instance-state-name,Values=running")
    cmd.append("--region")
    cmd.append(region)

    result = subprocess.run(cmd, stdout = subprocess.PIPE)
    resjson = json.loads(result.stdout.decode('utf-8'))

    for resv in resjson['Reservations']:
        for ins in resv['Instances']:
            typ = ins['InstanceType'].split('.')[1]
            #Forgive small
            if typ in ['nano', 'micro', 'small', 'medium']:
                continue
            insName = 'Anonymous'
            for tag in ins['Tags']:
                if tag['Key'] == 'Name':
                    insName = tag['Value']
            insTyp = ins['InstanceType']
            insLnc = ins['LaunchTime']
            check_result.append('ec2 ' + region_name[region] + ' ' + insName + '(' + insTyp + ') ' + insLnc)

def check_rds(region):
    cmd = []
    cmd.append("./aws")
    cmd.append("rds")
    cmd.append("describe-db-instances")
    cmd.append("--region")
    cmd.append(region)

    result = subprocess.run(cmd, stdout = subprocess.PIPE)
    resjson = json.loads(result.stdout.decode('utf-8'))

    for ins in resjson['DBInstances']:
        typ = ins['DBInstanceClass'].split('.')[2]
        if typ in ['nano', 'micro', 'small', 'medium']:
            continue
        if ins['DBInstanceStatus'] != 'available':
            continue
        insName = ins['DBInstanceIdentifier']
        insTyp = ins['DBInstanceClass']
        check_result.append('rds ' + region_name[region] + ' ' + insName + '(' + insTyp + ')')

def lambda_handler(event, context):
    check_ec2('ap-southeast-1') #Singapore
    check_ec2('ap-northeast-1') #Tokyo
    check_rds('ap-northeast-1') #Tokyo

    if len(check_result) > 0:
        message = '\Instance operation status of nlarge or more<@hogehoge> \n'
        for str in check_result:
            message += str
            message += '\n'
        print(message)
        url = 'https://hooks.slack.com/services/xxxx/yyyy/zzzzzzzz'
        method = 'POST'
        headers = {'Content-Type' : 'application/json'}
        payload = {'text' : message}
        json_data = json.dumps(payload).encode('utf-8')
        request = urllib.request.Request(url, data=json_data, method=method, headers=headers)
        with urllib.request.urlopen(request) as res:
            body = res.read()

if __name__ == '__main__':
    lambda_handler('','')

It's a good idea to mention multiple people in your Slack notification message who can understand the situation and drop the instance (those who don't ignore the notification are also unexpectedly important). If you are not using Slack, you can arrange it as an SNS. This time, we targeted only EC2 and RDS after large, but I think we can handle the number of instances and other services as well. I don't know.

Well, let's check if this also works at hand. If you can go, deploy.

>zip -r check-aws *

I'd like to upload it as a zip file and test it, but there are some caveats. ・ Have a PowerUserAccess role ・ Set a little more timeout ・ Set a little more memory It seems that 128MB of memory is enough, but since the CPU is so poor, I think it is better to set it to about 1024MB. Adjustment required.

Test it and hopefully (if it doesn't work, see the error message and do something about it) and set up the trigger and you're done. Shouldn't we just do something like cron (0 9? * MON-FRI *) in the schedule expression in EventBridge?

Then.

Recommended Posts

Throw costly instance health in Slack with Lambda
Put TensorFlow in P2 instance with pip3
Easy server monitoring with AWS Lambda (Python) and result notification in Slack
Sample to send slack notification with python lambda
Stop an instance with a specific tag in Boto3