[PYTHON] LINE Bot sent me the scraping results of IT trend information [LINE Messaging API]

This is my first Qiita post. If you make a mistake, you can use a harsh tone, so please let us know in the comments.

Finished product

State of use 2020-09-04_19h22_23.png

2020-09-04_19h22_04.png

How to use

  1. Register from the QR code.
  2. Type in "Hatena" or "Kita".
  3. Information can be obtained.

Caution

I don't know when to remove it from Heroku. In that case, I would like you to use it in your local environment, Raspberry Pi, or your own Heroku.

Background made

I go to the information system. However, I was only doing C language, and I had never created a creative work. (I didn't find programming fun)

I'm also developing a web application with Django, but first I chose LINE's BOT as an easy one to make.

In addition, KENTA, an omnivorous engineer, gathered information.

--Hatena Bookmark

You said that you are reading the trend column such as.

However, I didn't feel like opening my computer and checking the news from the morning. Then it would be better to continue programming ...

I thought it would be good to check it in my little spare time or while traveling, and to find out what is useful to me later, so I completed it in LINE, which has good portability.

People who want to use

I would like this person to use it.

--For commuting time --IT summary information --Those who want

is. Since it is a line BOT, it can be used easily without the need to uselessly install the application.

My current situation (please skip)

--Programming experience 4 months --You can use up to C language pointers (equivalent to C ++) --Heroku I touched it for the first time --Git, Github I touched it for the first time --Getting started with Django in Python --I learned only the basics of Python grammar --The code is dirty!

Development environment etc.?

I will write this for the first time.

Things necessary

In addition to the above development environment, LINE Developers registration is required.

LINE Developers Link

If registration doesn't work, check out the article at https://qiita.com/kro/items/67f7510b36945eb9689b.

Code and commentary

I will use the one distributed as a template on Github, modified in the previous Qiita article. There seems to be an error.

dir_name/main.py

from flask import Flask, request, abort

from linebot import (
    LineBotApi, WebhookHandler
)
from linebot.exceptions import (
    InvalidSignatureError
)
from linebot.models import (
    MessageEvent, TextMessage, TextSendMessage,
)
import os

app = Flask(__name__)

#Get environment variables
YOUR_CHANNEL_ACCESS_TOKEN = os.environ["YOUR_CHANNEL_ACCESS_TOKEN"]
YOUR_CHANNEL_SECRET = os.environ["YOUR_CHANNEL_SECRET"]

line_bot_api = LineBotApi(YOUR_CHANNEL_ACCESS_TOKEN)
handler = WebhookHandler(YOUR_CHANNEL_SECRET)

@app.route("/callback", methods=['POST'])
def callback():
    # get X-Line-Signature header value
    signature = request.headers['X-Line-Signature']

    # get request body as text
    body = request.get_data(as_text=True)
    app.logger.info("Request body: " + body)

    # handle webhook body
    try:
        handler.handle(body, signature)
    except InvalidSignatureError:
        abort(400)

    return 'OK'


@handler.add(MessageEvent, message=TextMessage)
def handle_message(event):
    line_bot_api.reply_message(
        event.reply_token,
        TextSendMessage(text=event.message.text))


if __name__ == "__main__":
#    app.run()
    port = int(os.getenv("PORT", 5000))
    app.run(host="0.0.0.0", port=port)

I will add scraping here.

Import the library before that

import requests
from bs4 import BeautifulSoup
import os,json

About scraping

These articles are very important. Regarding the burden on the server, it is dangerous if the sender intentionally makes a large number of requests. After watching the situation, delete it from Heroku, upload it to the server for private use from your own Raspberry Pi etc. and use it.

I haven't monetized it at all, and I've posted a link, so there should be no copyright issues. It means that you are quoting.

code

First, scraping about Hatena bookmarks. I was able to handle it in a fairly simple way.

def create_h():
    h_matome = []
    title_list = []
    url_list = []
    load_url = "https://b.hatena.ne.jp/hotentry/it"
    html = requests.get(load_url)
    soup = BeautifulSoup(html.content, "html.parser")
    topic = soup.find_all("a", class_="js-keyboard-openable")
    for element in topic[:30]:
        if not element:
            continue
        title_list.append(element.get("title"))
        url_list.append(element.get("href"))
    for (i, j) in zip(title_list, url_list):
        h_matome.append("[title]" + i)
        h_matome.append("[URL]" + j + "\n")
    h_matome_lenear = "--[Hatena Bookmark]--" + "\n" + "\n".join(h_matome)
    return h_matome_lenear

https://qiita.com/Daara_y/items/c4b01107bc6191b9fbff I adjusted the shape while referring to. It may not be helpful because it is pushed hard.

def create_q():
    load_url = "https://qiita.com/"
    html = requests.get(load_url)
    soup = BeautifulSoup(html.content, "html.parser")
    li = []
    for items in soup.find_all():
        if "data-hyperapp-props" in items.attrs:
            li.append(items["data-hyperapp-props"])
    """
Find the features below.
    pprint(li[1])
    ('{"trend":{"edges":[{"followingLikers":[],"isLikedByViewer":false,"isNewArrival":false,"hasCodeBlock":false,"node":{"createdAt":"2020-09-02T00:39:45Z","likesCount":511,"title":"JavaScript learning roadmap","uuid":"ae2dbbd34f8557d5af19","author":{"profileImageUrl":"https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/335670/profile-images/1598003375","urlName":"yukiji"}}},{"followingLikers":[],"isLikedByViewer":false,"isNewArrival":false,"hasCodeBlock":false,"node":{"createdAt":"2020-09-01T17:29:32Z","likesCount":249,"title":"Masa
    """
    datas = json.loads(li[1])
    result = []
    #I only want the key part called node, so I get this much
    for edges in datas['trend']['edges']:
        result.append(edges['node'])
    q_matome = []
    for v in result[:20]:
        q_matome.append('[title]:' + v['title'] )
        q_matome.append('[URL]:' + load_url + v['author']['urlName'] + "/items/" + v['uuid'] +  '\n')
    q_matome_lenear = "--【Qiita】--" + "\n\n" + "\n".join(q_matome)
    return q_matome_lenear

Also, make it react when you receive the word XX as a line message.

@handler.add(MessageEvent, message=TextMessage)
def handle_message(event):
    res1 = create_h()
    res2 = create_q()
    if event.type == "message":
        if event.message.text == "Hatena":
            line_bot_api.reply_message(
                event.reply_token,
                TextSendMessage(text=res1))
        elif event.message.text == "Kita":
            line_bot_api.reply_message(
                event.reply_token,
                TextSendMessage(text=res2))
        else:
            line_bot_api.reply_message(
                event.reply_token,
                TextSendMessage(text="Please enter "Hatena" or "Kita" to receive the information."))

Deploy using git on Heroku

For git, I will post a reference site. https://employment.en-japan.com/engineerhub/entry/2017/01/31/110000

Required files

Command line

Next, using git Bash,

I deployed it like this.

Recommended Posts

LINE Bot sent me the scraping results of IT trend information [LINE Messaging API]
[Python] Using Line API [1st Creation of Beauty Bot]
LINE Bot that notifies you of the stocks of interest
[LINE Messaging API] Create parrot return BOT with Python
A script that makes it easy to create rich menus with the LINE Messaging API
Web scraping of comedy program information and notification on LINE