[Python] LINE notification of the latest information using Twitter automatic search

As the title suggests, I wrote a code that realizes automatic search with python using Twitter API and notifies the information obtained from it with LINE Notify.


Motivation

The number of people infected each day is announced in connection with the epidemic of the new coronavirus. Regardless of whether or not you need to worry about the number, open Twitter to check the breaking news that you do not know when it will come out from noon to evening when the information on the number of infected people in Tokyo comes out and search. I've got a habit of calling. (I personally think that it is the fastest to check the flash report of the number of infected people on Twitter) ➡︎ I want to eliminate the work of searching on Twitter in vain! !! That is the motive for this time.


What has been achieved

  1. Twitter automatic search
  2. Get numbers from tweets about the number of infected people in Tokyo
  3. Notify with LINE Notify

This implementation is specialized for the number of infected people in Tokyo, but it seems that it can be applied in various ways with a slight change.


Things necessary

-** Python execution environment : Jupyter Notebook is used here - Twitter API registration : "Summary of procedures from Twitter API registration (account application method) to approval * Information as of August 2019" / 2524d21455aac111cdee) was read and registered - LINE Notify registration : I used the method and code of "Send a message to LINE with Python" as it is. --A python module that requires special environment construction - tweepy : I think pip install tweepy is okay - mecab-python3 **: You need to install mecab itself and dictionaries as well as the python module. For details, I think that you can build an environment by referring to "A story of struggling to introduce mecab-python3 on Mac". Also, with Google Colabolatory, just run ! Pip install mecab-python3 == 0.7 and everything you need will be installed automatically, so that may be easier.

Preparing these may actually be the most troublesome part. It's fun to google and find various tools with "{what you want to do} python", but lol

Implementation

The implementation details are explained below while looking at the actual code.

0. Implementation overview

** 1. Various imports ** ** 2. Create an object for Twitter access ** ** 3. Create an object for LINE notification ** ** 4. Create an automatic search function **

1. Various imports

import requests
import datetime
import time
import pandas as pd

from IPython.display import clear_output
#Function to erase the output of print

import tweepy
import MeCab

tagger = MeCab.Tagger("-Owakati")

The "tagger" in the last line is the object used when splitting sentences in MeCab.

Specifically, using tagger.parse

tagger.parse("It's nice weather today, is not it")
# => 'It's nice weather today, is not it\n'
tagger.parse("It's nice weather today, is not it").split()
# => ['today', 'Is', 'Good', 'weather', 'is', 'Ne']

It's like that. As we will see later, in this implementation, split () is used to make a list type and handle tweets.

2. Create an object for Twitter access

Create an object that incorporates the key and token obtained by registering the Twitter API. Reference: "How to use Tweety ~ Part 1 ~ [Getting Tweet]"

consumer_key = "Key obtained here/Enter token"
consumer_secret = "Same as above"
access_token = "Same as above"
access_token_secret = "Same as above"

auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)

api = tweepy.API(auth)  #I will use this later

3. Create an object for LINE notifications

I used the code from the article I quoted earlier, "Send a message to LINE with Python" (https://qiita.com/moriita/items/5b199ac6b14ceaa4f7c9).

class LINENotifyBot:
    API_URL = 'https://notify-api.line.me/api/notify'
    def __init__(self, access_token):
        self.__headers = {'Authorization': 'Bearer ' + access_token}

    def send(
            self, message,
            image=None, sticker_package_id=None, sticker_id=None,
            ):
        payload = {
            'message': message,
            'stickerPackageId': sticker_package_id,
            'stickerId': sticker_id,
            }
        files = {}
        if image != None:
            files = {'imageFile': open(image, 'rb')}
        r = requests.post(
            LINENotifyBot.API_URL,
            headers=self.__headers,
            data=payload,
            files=files,
            )
       
access_toke_Notify = "Enter the token here"
 
bot_Notify = LINENotifyBot(access_token=access_token_Notify)

Now, if you do bot_Notify.send (message =" xxxxx "), LINE will be delivered to the specified token.

4. Create an automatic search function

The basic idea is

--A certain number of the latest tweets containing the keywords "Tokyo, infection, {date} day" are extracted. --If there is an expression "n people", get that n --Of the n included in the acquired tweets, the one that appears most often is the candidate for the number of infected people. --Notify when the number of tweets including the number of infected people exceeds a certain percentage

It's like repeating this process at regular intervals.

So here is the final function ** auto_search ** to execute.

def auto_search(item=100,wait_time=60,rate=0.5):
    
    """
    item:Number of tweets to retrieve
    wait_time:Time interval for automatic search Unit s
    rate:Percentage of tweets containing the estimated number of infected people
    """
   
    d = datetime.datetime.now().day
    m = datetime.datetime.now().month

    print("searching on Twitter...")
    pre_mode = 0  #Variable for recording numbers that previously exceeded rate

    while True:
        df = find_infected_num(d,item)  # "n people"Function that returns n of the DataFrame type
        num_mode = df.mode().values[0,0]  #Mode of df=Get candidates for the number of infected people
        count = df.groupby("num").size()  #Aggregate data of the number of tweets per n

        #num_The frequency of appearance of mode exceeds rate & num_mode is a new appearance
        if count.max() > item*rate and num_mode!=pre_mode:
            #Result output
            print("\n--RESULT--")
            print(count)

            #LINE notification of results
            text = "{}Month{}Day\n Infected people in Tokyo[{}]Man\n * Tweet ratio is{:.2f}%".format(m,d,num_mode,count.max()/item*100)
            bot_Notify.send(message=text)  #Send to LINE

            #Conditional branching to allow continuation if the result is inappropriate
            if input("\ncontinue? y/n   ")=="n":
                break  #End

        waiting(2,wait_time,count)  #For display during waiting time

        #pre_mode update
        if count.max() > item*rate:
            pre_mode = num_mode

** find_infected_num ** is a function to return n of "n people" in DataFrame. Here, the tagger prepared in 1 and the api prepared in 2 are used.

def find_infected_num(d,item):
    num_list = []  #List to store n
    for tweet in tweepy.Cursor(api.search, q=['infection',"Tokyo","{}Day".format(d)]).items(item):
        split_tweet = tagger.parse(tweet.text).split()
        if "Man" in split_tweet:
            index = split_tweet.index("Man") - 1
            n = cut_number(split_tweet,index)  # "Man"A function that returns the number immediately before
            num_list.append(n)
    return pd.DataFrame(num_list,columns=["num"])

The ** cut_number ** included here is a function that gets the number immediately before the "person".

def cut_number(split_tweet,index):
    start_i = index  #A variable that represents the position where the number started in the tweet
    
    # "Man"Returns 0 if the number immediately before is not a str type (10000 is appropriate)
    if not split_tweet[index] in list(map(str,range(0,10000))):
        return 0

    ans = split_tweet[start_i]  # "Man"Get the number just before
    while True:
        #Add to the left side of ans as long as the numbers continue
        if split_tweet[start_i-1] in list(map(str,range(0,9))):
            start_i -= 1
            ans = split_tweet[start_i] + ans
        #Returns ans when the number is over
        else:
            return ans                   

Let me explain a little about why you need such a function. For example, if there is a sentence "Today there are 123 infected people", if you try to divide by mecab,

tagger.parse("123 people are infected today").split()
# => ['today', 'of', 'infection', 'Person', 'Is', '1', '2', '3', 'Man']

In this way, 1, 2 and 3 have been separated. As it is, only the last digit of the number of infected people can be obtained, so I created ** cut_number ** to get the correct number.


Another function that appears in ** auto_search ** is ** waiting **, which is a function that visualizes the time remaining until the next automatic search. (It's like a bonus because it has nothing to do with the function of the main unit.)

def waiting(div,wait_time,count):
    clear_output()
    for i in range(1,wait_time//div+1):
        print("waiting: |"+"*"*i+" "*(wait_time//div-i)+"|")
        print("\n--RESULT--")
        print(count)
        time.sleep(div)
        clear_output()
    print("searching on Twitter...")

Try using

Due to the nature of the algorithm, it is not always possible to catch the preliminary report of the number of infected people, so I actually try to move it and adjust the parameters. (The above code uses the already adjusted values)

The following is the execution result of ** 7/19 **.

Result 1

(Running with ʻitem = 30`)

スクリーンショット 2020-07-19 17.19.04.png

At this time, the following notification was sent to LINE. LINE_capture_616855110.578717.jpg

If you enter y in the input field of continue ?, the search will continue, but the number of infected people announced in Tokyo on this day is 188, so it is okay. (Even if you continue to search, you will not receive repeated notifications because pre_mode = 188.)

Result 2

(Running with ʻitem = 100`)

スクリーンショット 2020-07-19 21.27.13.png

When waiting, it is displayed like this. As you can see from RESULT, there are no candidates that exceed 50%, so no notification will come. On the contrary, it can be seen that the percentage decreases even if the number is correct after a certain period of time has passed since the number of infected people was announced.

Future tasks

Based on the test results, I will list future issues.

――The moment when the number of infected people is reported is not detected yet --- rate = 0.5, but depending on the time of day, an incorrect value may be detected as a preliminary value.

I would like to continue to run this program daily and test it to confirm these issues.

Digression

It seems that you can get the preliminary value of the number of infected people by specifying the tweet of the official news agency without taking the majority vote from miscellaneous tweets (and it is accurate), but you do not know exactly where to tweet the fastest. There is no such method, so I tried this method. By the way, I tested it on 7/18, but the tweet rate of "290" exceeded 80% immediately after the announcement, probably because the number of infected people in Tokyo was 290. In response to that, 7/19 was initially operated as rate = 0.8, but failed without receiving a notification even if the preliminary value came out. It was up to me to lower the rate and adjust it. It is difficult to talk about different degrees depending on the day, but I am looking forward to seeing how accurate notifications can be made with the simple algorithm of "picking up many muttered numbers". That's why it may include pursuing interest as personal development rather than practical use lol

Recommended Posts

[Python] LINE notification of the latest information using Twitter automatic search
Search Twitter using Python
Google search for the last line of the file in Python
Tweet using the Twitter API in Python
In search of the fastest FizzBuzz in Python
Try using the collections module (ChainMap) of python3
Find the geometric mean of n! Using Python
Explanation of the concept of regression analysis using python Part 2
Cut a part of the string using a Python slice
the zen of Python
The pain of gRPC using Python. November 2019. (Personal memo)
[Python] Using Line API [1st Creation of Beauty Bot]
Explanation of the concept of regression analysis using Python Part 1
Explanation of the concept of regression analysis using Python Extra 1
Using the National Diet Library Search API in Python
Study from the beginning of Python Hour8: Using packages
The story of automatic language conversion of TypeScript / JavaScript / Python
Extract the band information of raster data with python
Put the latest version of Python on linux (Debian) on Chromebook
I touched the latest automatic test tool "Playwright for Python"
Web scraping of comedy program information and notification on LINE
View using the python module of Nifty Cloud mobile backend
Try a similar search for Image Search using the Python SDK [Search]
[Python] I tried to visualize the follow relationship of Twitter
[Python] I tried collecting data using the API of wikipedia
Find the diameter of the graph by breadth-first search (Python memory)
Tweet Now Playing to Twitter using the Spotify API. [Python]
Output product information to csv using Rakuten product search API [Python]
Towards the retirement of Python2
Automatic update of Python module
Try using the Twitter API
Post to Twitter using Python
Search algorithm using word2vec [python]
python: Basics of using scikit-learn ①
Try using the Twitter API
Search twitter tweets with python
Broadcast on LINE using python
About the features of Python
The Power of Pandas: Python
[Python] Let LINE notify you of the ranking of search results on your site on a daily basis.
Find the critical path of PERT using breadth-first search and depth-first search
[Python] I wrote the route of the typhoon on the map using folium
Read the standard output of a subprocess line by line in Python
Basic map information using Python Geotiff conversion of numerical elevation data
Crawling with Python and Twitter API 2-Implementation of user search function
I looked at the meta information of BigQuery & tried using it
Collect product information and process data using Rakuten product search API [Python]
[Python] Create a script that uses FeedParser and LINE Notify to notify LINE of the latest information on the new coronavirus of the Ministry of Health, Labor and Welfare.
Get and set the value of the dropdown menu using Python and Selenium
[Anomaly detection] Try using the latest method of deep distance learning
PhytoMine-I tried to get the genetic information of plants with Python
Image capture of firefox using python
[Python] The stumbling block of import
First Python 3 ~ The beginning of repetition ~
Try the Python LINE Pay SDK
Removal of haze using Python detailEnhanceFilter
Existence from the viewpoint of Python
pyenv-change the python version of virtualenv
Change the Python version of Homebrew
[Python] Understanding the potential_field_planning of Python Robotics
Review of the basics of Python (FizzBuzz)