[PYTHON] Get a lot of Twitter tweets at once

Introduction

It is assumed that you can use the Twitter API (sorry) This article is for people who want to do various things using tweets but are having trouble collecting data.

Purpose

If you try to search tweets normally using Twitter's API search, you can only get the top 100. However, you can get tweets up to the last week by tracing the ID assigned to each tweet. The following is the search program.

tweet_search.py


# -*- coding:utf-8 -*-
import tweepy

def search_tweet(api):
    print('Output from the latest tweets')
    print('Search page: 1')
    tweet_data = api.search(q=' ', count=100)
    for tweet in tweet_data:
            print(tweet.text)
            print('************************************************\n')
    next_max_id = tweet_data[-1].id
    for i in range(2, 11):
        print('Search page:' + str(i))
        tweet_data = api.search(q=' ', count=100, max_id=next_max_id-1)
        next_max_id = tweet_data[-1].id
        for tweet in tweet_data:
            print(tweet.text)
            print('************************************************\n')

if '__name__' == '__main__':
    consumer_key = "XXXXXXXXXXXXXXXXXXXXXXX"
    consumer_secret = "XXXXXXXXXXXXXXXXXXXXXXXX"
    access_token = "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    access_token_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXX"
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    search_tweet(api)

To briefly explain the mechanism ・ Every tweet has a tweet ID (all numbers). -The ID is larger for the latest tweet (for example, if the ID of the latest tweet is 7000, the ID of the previous tweet is 6999). ・ Normal API search goes back to the past tweets in order from the latest tweets. -And API search has an argument called max_id, and by specifying this ID, you can search only tweets with this value or less.

By using these four points, the process of "searching from tweet IDs of 7,000 or less" becomes possible.

In other words

  1. Get 100 tweets
  2. Save the 100th ID
  3. Search from the ID below the value (return to 1)

By repeating this, you can keep getting past tweets indefinitely! I'd like to say that, but unfortunately the API can only get tweets within a week, so it's not infinite. However, the number of tweets that can be collected is overwhelmingly larger than using API search normally.

that's all.

Recommended Posts

Get a lot of Twitter tweets at once
Get out of multiple loops at once
How to put a lot of pipelines together and put them away at once
Use twitter API to get the number of tweets related to a certain keyword
Get UNIXTIME at the beginning of today with a command
I made a tool to get the answer links of OpenAI Gym all at once
Get only image tweets on twitter
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
Connect a lot of Python or and and
Delete unnecessary containers of Docker at once
[To Twitter gentlemen] I wrote a script to convert .jpg-large to .jpg at once.
[python] Get a list of instance variables
Get all live tweets of professional baseball
Get the filename of a directory (glob)
Get a datetime instance at any time of the day in Python
[Python] Get a list of folders only
Get lots of your tweets with Tweepy
Get AKB member's Google+ ID at once
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
One-liner to create a large number of test files at once on Linux
[Rails] Mechanize consumes a lot of file descriptors
Get the caller of a function in Python
Get a list of IAM users with Boto3
Get a list of Qiita likes by scraping
Get a glimpse of machine learning in Python
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Get Twitter Trends
I quarantined my environment with virtualenv, but I get a lot of packages with pip list
Get a glimpse of which cryptocurrencies you should have bought a week ago at Poloniex
Get Twitter userData
Make a list of latitude and longitude and convert UTM coordinates at once → File output
Get corporate number at once via gbizinfo with python
Get the variable name of the variable as a character string.
[GoLang] Set a space at the beginning of the comment
Take a peek at the processing of LightGBM Tuner
Python that merges a lot of excel into one excel
Python + selenium to GW a lot of e-mail addresses
Get twitter tweets, followers, followers, likes, with urllib and beautifulSoup
Python: Get a list of methods for an object
[Piyopiyokai # 1] Let's play with Lambda: Get a Twitter account
Tasks at the start of a new python project
Twitter API: Get a list of accounts that you follow but are not followed back