The Twitter API has usage restrictions and cannot be used more than the specified number of times during a specific period. In the following article, tweets are searched using the search / tweets API, but the maximum number that can be acquired with one API is 100, and it can only be executed 180 times in 15 minutes.
** Search Twitter using Python **
This is fine if there is little data, but it is not suitable for the usage of "continuing to search for hashtags including #general elections during election programs".
Therefore, Twitter provides the Streaming API as a way to constantly acquire data.
The Streaming APIs
Streaming API The Streaming API may keep developers getting Twitter information with little delay.
There are three main types of Streaming API.
name | Description |
Public streams | You can get the public Twitter data. Can be filtered by keyword or locationfilterIs available |
User streams | Gets the data and events for a particular authenticated user. |
Site streams | Get data for multiple users. Currently in beta. Access is restricted to whitelist accounts. |
Continue to register specific keywords in the DB with Python.
python_twitter A library for operating Twitter with Python.
peewee An ORM that can use sqlite, postgres, and mysql.
# -*- coding: utf-8 -*-
# easy_install python_twitter
import twitter
import sys
import codecs
import dateutil.parser
import datetime
import time
from peewee import *
db = SqliteDatabase('twitter_stream.sqlite')
class Twitte(Model):
createAt = DateTimeField(index=True)
idStr = CharField(index=True)
contents = CharField()
class Meta:
database = db
# easy_If you don't have GetStreamFilter in the latest version of install, add the following code
def GetStreamFilter(api,
'''Returns a filtered view of public statuses.
A list of user IDs to track. [Optional]
A list of expressions to track. [Optional]
A list of Latitude,Longitude pairs (as strings) specifying
bounding boxes for the tweets' origin. [Optional]
Specifies a message length. [Optional]
Set to True to have Twitter deliver stall warnings. [Optional]
A twitter stream
if all((follow is None, track is None, locations is None)):
raise ValueError({'message': "No filter parameters specified."})
url = '%s/statuses/filter.json' % api.stream_url
data = {}
if follow is not None:
data['follow'] = ','.join(follow)
if track is not None:
data['track'] = ','.join(track)
if locations is not None:
data['locations'] = ','.join(locations)
if delimited is not None:
data['delimited'] = str(delimited)
if stall_warnings is not None:
data['stall_warnings'] = str(stall_warnings)
json = api._RequestStream(url, 'POST', data=data)
for line in json.iter_lines():
if line:
data = api._ParseAndCheckTwitter(line)
yield data
def main(argvs, argc):
if argc != 6:
print ("Usage #python %s consumer_key consumer_secret access_token_key access_token_secret #tag1,#tag2 " % argvs[0])
return 1
consumer_key = argvs[1]
consumer_secret = argvs[2]
access_token_key = argvs[3]
access_token_secret = argvs[4]
#The character code to be converted to UNICODE is matched to the target terminal.
track = argvs[5].decode('cp932').split(',')
db.create_tables([Twitte], True)
api = twitter.Api(base_url="",
for item in GetStreamFilter(api, track=track):
print '---------------------'
if 'text' in item:
print (item['id_str'])
print (dateutil.parser.parse(item['created_at']))
print (item['text'])
print (item['place'])
row = Twitte(createAt=dateutil.parser.parse(item['created_at']),
row = None
if __name__ == '__main__':
sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout, errors='backslashreplace')
argvs = sys.argv
argc = len(argvs)
sys.exit(main(argvs, argc))
python consumer_key consumer_secret access_token_key access_token_secret #election,#House of Representatives election,election
If you specify the access token information and keywords, a SQLITE database called twitter_stream.sqlite will be created in the current directory.
-The latest code provides the GetStreamFilter method, but it doesn't exist in the version that can be obtained with easy_install in Python 2.7. I'm implementing the same code here.
-Since it is running on Windows here, the track variable is converted with cp932, but please match this with the character code of the terminal.
-It doesn't matter when a large amount of data is flowing, but there is a delay of several minutes in acquiring the data to be filtered last tweeted.
-Basically, it is better to register the database and perform time-consuming processing in a separate process.
-Past data cannot be obtained, so it is necessary to make it a resident process.
-Since created_at is UTC time, it is 9 hours later than Japanese time. The time registered here plus 9 hours will be the time in Japan.
Recommended Posts