[PYTHON] Try automating Qiita's like monitoring with Lambda + DynamoDB + CloudWatch

This article is a relay of "2020 New Year Advent Calendar TechConnect!" of Link Information Systems. This is an article. TechConnect! Is a self-starting Advent calendar that is relayed by a self-made group called engineer.hanzomon. (For Facebook of the link information system, click here](https://ja-jp.facebook.com/lis.co.jp/))

This article is for the 7th day, 1/15 (Wednesday).


Past article I also do it, but I've become a person in charge of counting the number of likes of our ad-care. (I haven't made an article about the ad-care before that, but I collect it with shell art) So, anyway, if the number of likes to be automated exceeds a certain number, I thought it would be nice if I notified it like a Qiita milestone ~~ It would be fun ~~, so I decided to run it on AWS Lambda.

As for the implementation, it is like scraping the Adcare top page to collect the article URL and getting the number of likes with the Qiita API. First from the Lambda function that collects the article ID

code
import os
import requests
import boto3
from selenium import webdriver
from bs4 import BeautifulSoup
from urllib.parse import urljoin

def lambda_handler(event, context):
    api_endpoint = 'https://qiita.com/api/v2/'

    try:
        dynamoDB = boto3.resource("dynamodb")
        advent_calendar = dynamoDB.Table("advent_calendar")

        options = webdriver.ChromeOptions()
        options.binary_location = "/opt/bin/headless-chromium"

        options.add_argument("--headless")
        options.add_argument("--disable-gpu")
        options.add_argument("--window-size=1280x1696")
        options.add_argument("--disable-application-cache")
        options.add_argument("--disable-infobars")
        options.add_argument("--no-sandbox")
        options.add_argument("--hide-scrollbars")
        options.add_argument("--enable-logging")
        options.add_argument("--log-level=0")
        options.add_argument("--single-process")
        options.add_argument("--ignore-certificate-errors")
        options.add_argument("--homedir=/tmp")

        driver = webdriver.Chrome(executable_path="/opt/bin/chromedriver", options=options)
        driver.get(os.environ['TARGET_URL'])
        soup = BeautifulSoup(driver.page_source, 'html.parser')

        item = soup.find('div', id='personal-public-article-body')
        tables = item.find_all('tbody')
        for table in tables:
            rows = table.find_all('tr')
            for row in rows:
                user_id = row.find_all('td')[1].text
                tmp = row.find_all('td')[2].find('a')['href']
                item_id = tmp[tmp.find('items/'):]
                response = advent_calendar.get_item(
                    Key={
                        'user_id': user_id,
                        'item_id': item_id
                    }
                )
                if 'Item' not in response:
                    advent_calendar.put_item(
                        Item = {
                            "user_id": user_id,
                            "item_id": item_id,
                            'likes': 0
                        }
                    )
    except Exception as e:
        print(e)
    finally:
        driver.quit()
    return
Register the required libraries and chromedrivers in Lambda Layers in advance. Save the acquired article ID in DynamoDB. The number of likes is also initialized here. Run this Lambda function in a CloudWatch Event every hour.

So, the function that issues the Qiita API for this collected article ID and gets the number of likes is as follows

code
import os
import boto3
import requests
from urllib.parse import urljoin
import smtplib
from email.message import EmailMessage

def lambda_handler(event, context):
    api_endpoint = 'https://qiita.com/api/v2/'
    headers = {'Authorization': 'Bearer ' + os.environ['QIITA_AUTH']}

    dynamoDB = boto3.resource("dynamodb")
    advent_calendar = dynamoDB.Table("advent_calendar")

    try:
        smtp = smtplib.SMTP_SSL(os.environ['SMTP_HOST'], int(os.environ['SMTP_PORT']))
        smtp_user = os.environ['SMTP_USER']
        smtp_pass = os.environ['SMTP_PASS']
        message = EmailMessage()
        message['From'] = os.environ['FROM_ADDRESS']
        message['To'] = os.environ['TO_ADDRESS']
        message['Subject'] = 'Adcare Like Monitoring'
        smtp.login(smtp_user, smtp_pass)

        response = advent_calendar.scan()
        for i in response['Items']:
            user_id = i['user_id']
            item_id = i['item_id']
            old_likes = int(i['likes'])
            item_url = urljoin(api_endpoint, item_id)
            item_detail = requests.get(item_url, headers=headers).json()
            
            title = item_detail['title']
            url = item_detail['url']
            new_likes = int(item_detail['likes_count'])
            comments = int(item_detail['comments_count'])
            stockers_url = urljoin(api_endpoint, item_id + '/stockers?per_page=100')
            stockers = len(requests.get(stockers_url, headers=headers).json())
            
            if old_likes < 100 and new_likes >= 100:
                message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 100 likes")
                smtp.send_message(message)
            elif old_likes < 50 and new_likes >= 50:
                message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 50 likes")
                smtp.send_message(message)
            elif old_likes < 30 and new_likes >= 30:
                message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 30 likes")
                smtp.send_message(message)
            elif old_likes < 10 and new_likes >= 10:
                message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 10 likes")
                smtp.send_message(message)
            
            advent_calendar.put_item(
                Item = {
                    "user_id": user_id,
                    "item_id": item_id,
                    "likes" : new_likes,
                    "comments" : comments,
                    "stockers" : stockers
                }
            )
    except Exception as e:
        print(e)
    finally:
        smtp.close()
    return
Scan DynamoDB to get the article ID, issue the Qiita API to it, compare it with the number of likes obtained last time, and the judgment part is very rough, but if it exceeds the threshold, send an email. I will not notify you after ~~ 100, but you won't like it so much ~~ It runs every minute on CloudWatch Event.

Actually, I wanted to send a notification to Microsoft Teams, which is used as our internal chat, but I could not realize it because of my two-step verification with Office 365 authentication ... Currently, I am just flying to my email address. I thought I'd try to transfer it automatically in Outlook, but I couldn't transfer it due to lack of authority. I wonder what it is.


There is a slight feeling of one-handedness, but I was able to automate the collection of likes. I wonder if I will pull the data of DynamoDB and get the final result when the calendar is over and there is a paragraph. At first, I was thinking of using ZABBIX's HTTP Agent, but since the free EC2 tier has disappeared, I decided to use Lambda + DynamoDB. Free frame is the best.

Tomorrow is @ h-yamasaki.


1/17 Also modified to collect the number of comments and the number of stocks, the number of API issuance increased, so it seemed that the upper limit of 1000 times per hour would be caught, so the monitoring interval of CloudWatch Event was changed from 1 minute to 5 minutes

Recommended Posts

Try automating Qiita's like monitoring with Lambda + DynamoDB + CloudWatch
Try automating Start / Stop for EC2 instances with AWS Lambda
Operate Dynamodb from Lambda like SQL
Manipulate DynamoDB data with Lambda (Node & Python)
Easy REST API with API Gateway / Lambda / DynamoDB
Try assigning or switching with Python: lambda
[AWS SAM] Create API with DynamoDB + Lambda + API Gateway