[PYTHON] Get popular tweets from the Twitter timeline, list them and graduate from Tsui Abolition

background

I'm probably out of Twitter, and when I look at the screen time, I spend an average of about 3 hours a day on Twitter. I'm doing almost one-eighth of the day on Twitter ... Well, there are people who are even worse than me ... Also, if you feel like it, please follow me → @y_a_m_a_y_a (I changed the account that Qiita works with)

Most of what I'm doing is just arranging tweets on the timeline where I see people's tweets and check trends (because I'm sensitive to trends), so I wanted to create a program to reduce that time. ..

Here is the created Demo (Posted on Twitter as a video).

What you want to do

  1. Get your own timeline
  2. Sort in descending order of favo and RT
  3. Export the top tweets in HTML

environment

Google Colab (Environment construction was troublesome) The code is on GitHub and google drive.

1. Get your own timeline

Basically, there are many explanations for this part.

Developer registration

To use the Twitter api, you need to register as a developer. This part has been written by various people, so please refer to that. Summary of procedures from Twitter API registration (account application method) to approval * Information as of August 2019 Since it is an academic purpose for students to study, I was certified immediately in about a day.

Get the timeline

Using the twitter library, you can easily tweet, follow, etc. on Twitter. (Please install it when using with Colab)

First of all, authentication is required. Authenticate twitter with the following code.

AOuth.py


import twitter
#Set the obtained key and access token(Enter your own key)
auth = twitter.OAuth(consumer_key="-------------------",
                     consumer_secret="-------------------",
                     token="-------------------",
                     token_secret="-------------------")

t = twitter.Twitter(auth=auth)

Get Tweets

Next is the acquisition of tweets. You can get the tweet with the following code.

python


t.statuses.home_timeline(count =Number of tweets acquired)

You can do it with. It will come back with a json file. This page is very easy to understand as a response example. If you import the json library, you can use json like a list

Example


import json
last_id = get_data[len(get_data)-1]["id"]

The code above gets the ID of the tweet.

Twitter's api restrictions are strict, so the number of tweets that can be retrieved at one time is about 200. Official Reference Therefore, you can get it by repeatedly reading this code with a while statement.

Go back tweets with max_id

By using max_id, you can get tweets before the specified tweet ID.

python


t.statuses.home_timeline(count =Number of tweets acquired,max_id =Tweet id)

The point to note here is that tweets with the specified id will be included more than once. Exclude with an if statement.

Now you can get tweets !!!

However, according to the API specifications, it seems that the number of tweets that can be traced back is about 800 tweets ... (What a hell ...) [Articles that can get 3200 tweets using Tweepy](https://www.pytry3g.com/entry/python-twitter-timeline#Cursor%E3%82%92%E4%BD%BF%E3%81%A3% E3% 81% A6% E5% A4% A7% E9% 87% 8F% E3% 81% AE% E3% 83% 84% E3% 82% A4% E3% 83% BC% E3% 83% 88% E3% 82% 92% E5% 8F% 96% E5% BE% 97% E3% 81% 99% E3% 82% 8B), but I couldn't use it because I was caught in the 15-minute Twitter api limit immediately after trying it. I gave up after 800 tweets ...

Get time

The original idea was that I wanted to go back to a 24-hour tweet, so I'll use the datetime library to find out the time. Since the tweet time is not in the datetime format, it is forcibly converted by string operation. That is the following part.

python


import datetime

date_create = get_data[int(len(get_data)-1)]["created_at"] #Get the creation date and time from the tweet
date_create_at = datetime.datetime(year=int(date_create[26:30]), month=int(date.month), 
          day=int(date_create[8:10]), hour=int(date_create[11:13]),minute=int(date_create[14:16]))#Converting the posting time of tweets

I wanted to compare the time (end when it is more than 24 hours ago), so I subtract it and convert it to seconds so that it is easy to compare the values. I didn't use it ...

python


diff = date - date_create_at
...
if diff.seconds >= 86400 or count >15: #24 hours or when you get stuck in a loop limit
    break

If you combine these areas, you can get 800 tweets from the current time! (Yeah)

Sort in descending order of favo and RT

Then sort. You can use the library sort to sort without considering the efficiency of the algorithm. Since it is a multidimensional list sort, use lamda to sort.

python


data.sort(key=lambda data: data[5]) #Sort using a lambda expression(Fabo)

Also, this time we will sort by the number of favos and the number of RTs, so copy the data array. (Because it seems to be easier when exporting) Copy uses the copy library.

python


import copy
data_RT = copy.copy(data)

Export top tweets in HTML

Finally, export. Create it by referring to HTML generator officially released by Twitter. I found that if I specify the url properly, it will be displayed correctly, so all I have to do is substitute the value. Defined in a function.

python


#Create HTML from tweets (function)
def create_html(text ,username ,userid, tweetid):
  html =  '<blockquote class="twitter-tweet"><p lang="ja" dir="ltr">' + text + '</p>&mdash;' + username +'(@' + userid  +') <a href="https://twitter.com/' + userid + '/status/' +tweetid + '?ref_src=twsrc%5Etfw"></a></blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>'
  return html

Write out the last 10 in the For statement and you're done! !!

Impressions

It's a pity that the Twitter API is so restrictive that we could only go back about 3 hours as a result, instead of going back the first 24 hours. (I think the results will change around here depending on the number of followers) Also, I can not do it in each personal environment unless I register as a developer, and since I am doing it with colab, I feel that file generation does not go well compared to local, and there are many improvements ...

It seems that it will take a little longer to graduate from Tsui. .. .. ·Execution result

スクリーンショット 2020-05-16 15.13.34.png

Finally the code

Legacy code feeling ...

python


import twitter
import json
import datetime
import copy

#Create HTML from tweets (function)
def create_html(text ,username ,userid, tweetid):
  html =  '<blockquote class="twitter-tweet"><p lang="ja" dir="ltr">' + text + '</p>&mdash;' + username +'(@' + userid  +') <a href="https://twitter.com/' + userid + '/status/' +tweetid + '?ref_src=twsrc%5Etfw"></a></blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>'
  return html

#Set the obtained key and access token(Enter your own key)
auth = twitter.OAuth(consumer_key="-------------------",
                     consumer_secret="-------------------",
                     token="-------------------",
                     token_secret="-------------------")

t = twitter.Twitter(auth=auth)

#Get the timeline(Go back as far as possible with a while statement)
date = datetime.datetime.now() #Get the time now
date = datetime.datetime(year=date.year, month=date.month, day=date.day, hour=date.hour,minute=date.minute)
count = 0 #To count the number of tweets acquired
data =[] #Save the acquired data
#data_swap =[]
diff = date - date
print("Start time",date)

while(1): #Start data acquisition
  if diff.seconds >= 86400 or count >15: #24 hours or when you get stuck in a loop limit
    print("End of search")
    print(count)
    break
  elif count ==0: #The first time
    get_data = t.statuses.home_timeline(count = 200)
    for i in range(len(get_data)):
      # create,"id",name,userid,text,"favorite_count","retweet_count"
      data_swap = [str(get_data[i]["created_at"]),str(get_data[i]["id"]),str(get_data[i]["user"]["name"]),str(get_data[i]["user"]["screen_name"]),str(get_data[i]["text"]),int(get_data[i]["favorite_count"]),int(get_data[i]["retweet_count"])]
      data.append(data_swap)
    last_id = get_data[len(get_data)-3]["id"] #Try to get previous tweets from the id of the last tweet
    date_create = get_data[int(len(get_data)-1)]["created_at"]
    date_create_at = datetime.datetime(year=int(date_create[26:30]), month=int(date.month), day=int(date_create[8:10]), hour=int(date_create[11:13]),minute=int(date_create[14:16]))#Get time to post tweets
    count = count +1
  else: #At other times
    get_data = t.statuses.home_timeline(count = 800, max_id = last_id)
    if len(get_data) == 0: #When the number of tweets acquired becomes 0 due to the limit
      print("End of search")
      break
    for i in range(len(get_data)):
      if i >0: # max_Get after the tweet specified by id
        data_swap = [str(get_data[i]["created_at"]),str(get_data[i]["id"]),str(get_data[i]["user"]["name"]),str(get_data[i]["user"]["screen_name"]),str(get_data[i]["text"]),int(get_data[i]["favorite_count"]),int(get_data[i]["retweet_count"])]
        data.append(data_swap)
    last_id = get_data[len(get_data)-1]["id"] #Try to get previous tweets from the id of the last tweet
    date_create = get_data[int(len(get_data)-1)]["created_at"]
    date_create_at = datetime.datetime(year=int(date_create[26:30]), month=int(date.month), day=int(date_create[8:10]), hour=int(date_create[11:13]),minute=int(date_create[14:16]))#Get time to post tweets
    diff = date -date_create_at
    count = count +1

print("End time",date_create_at)
print("Total number of data:",len(data))

#Sort
size = len(data)
data.sort(key=lambda data: data[5]) #Sort using a lambda expression(Fabo)
data_RT = copy.copy(data)
data_RT.sort(key=lambda data: data[6]) #Sort using a lambda expression(RT)

#View results(HTML)
path = '/content/drive/My Drive/Colab Notebooks/TwitterAPI/' + 'data/TimeLine' +str(date.month) +'Month' +str(date.day) + 'Day' +'.html'
with open(path, mode='w') as f:
    f.write("<H1>The 10 best tweets with the most favos!<H1>")
    f.write("<H1>"+str(date + datetime.timedelta(hours =9)) + "~" +str(date_create_at + datetime.timedelta(hours =9)) +"<H1>")
    for i in range(10):
        f.write("<H2>No."+str(i+1)+"Rank!! "+ str(data[size-1-i][5]) + "Fabo<H2>")
        f.write(create_html(data[size-1-i][4],data[size-1-i][2],data[size-1-i][3],data[size-1-i][1]))
        
    #At RT
    f.write("<H1>The 10 best tweets with the most RTs!<H1>")
    for i in range(10):
        f.write("<H2>No."+str(i+1)+"Rank!! "+ str(data[size-1-i][6]) + "RT <H2>")
        f.write(create_html(data_RT[size-1-i][4],data_RT[size-1-i][2],data_RT[size-1-i][3],data_RT[size-1-i][1]))
print("End of writing")

Recommended Posts

Get popular tweets from the Twitter timeline, list them and graduate from Tsui Abolition
Get the address from latitude and longitude
Get twitter tweets, followers, followers, likes, with urllib and beautifulSoup