The Twitter API has usage restrictions and cannot be used more than the specified number of times during a specific period. In the following article, tweets are searched using the search / tweets API, but the maximum number that can be acquired with one API is 100, and it can only be executed 180 times in 15 minutes.
** Search Twitter using Python ** http://qiita.com/mima_ita/items/ba59a18440790b12d97e
This is fine if there is little data, but it is not suitable for the usage of "continuing to search for hashtags including #general elections during election programs".
Therefore, Twitter provides the Streaming API as a way to constantly acquire data.
The Streaming APIs https://dev.twitter.com/streaming/overview
Streaming API The Streaming API may keep developers getting Twitter information with little delay.
There are three main types of Streaming API.
name | Description |
---|---|
Public streams | You can get the public Twitter data. Can be filtered by keyword or locationfilterIs available |
User streams | Gets the data and events for a particular authenticated user. |
Site streams | Get data for multiple users. Currently in beta. Access is restricted to whitelist accounts. |
Continue to register specific keywords in the DB with Python.
python_twitter https://code.google.com/p/python-twitter/ A library for operating Twitter with Python.
peewee https://github.com/coleifer/peewee An ORM that can use sqlite, postgres, and mysql.
# -*- coding: utf-8 -*-
# easy_install python_twitter
import twitter
import sys
import codecs
import dateutil.parser
import datetime
import time
from peewee import *
db = SqliteDatabase('twitter_stream.sqlite')
class Twitte(Model):
createAt = DateTimeField(index=True)
idStr = CharField(index=True)
contents = CharField()
class Meta:
database = db
# easy_If you don't have GetStreamFilter in the latest version of install, add the following code
# https://github.com/bear/python-twitter/blob/master/twitter/api.py
def GetStreamFilter(api,
follow=None,
track=None,
locations=None,
delimited=None,
stall_warnings=None):
'''Returns a filtered view of public statuses.
Args:
follow:
A list of user IDs to track. [Optional]
track:
A list of expressions to track. [Optional]
locations:
A list of Latitude,Longitude pairs (as strings) specifying
bounding boxes for the tweets' origin. [Optional]
delimited:
Specifies a message length. [Optional]
stall_warnings:
Set to True to have Twitter deliver stall warnings. [Optional]
Returns:
A twitter stream
'''
if all((follow is None, track is None, locations is None)):
raise ValueError({'message': "No filter parameters specified."})
url = '%s/statuses/filter.json' % api.stream_url
data = {}
if follow is not None:
data['follow'] = ','.join(follow)
if track is not None:
data['track'] = ','.join(track)
if locations is not None:
data['locations'] = ','.join(locations)
if delimited is not None:
data['delimited'] = str(delimited)
if stall_warnings is not None:
data['stall_warnings'] = str(stall_warnings)
json = api._RequestStream(url, 'POST', data=data)
for line in json.iter_lines():
if line:
data = api._ParseAndCheckTwitter(line)
yield data
def main(argvs, argc):
if argc != 6:
print ("Usage #python %s consumer_key consumer_secret access_token_key access_token_secret #tag1,#tag2 " % argvs[0])
return 1
consumer_key = argvs[1]
consumer_secret = argvs[2]
access_token_key = argvs[3]
access_token_secret = argvs[4]
#The character code to be converted to UNICODE is matched to the target terminal.
track = argvs[5].decode('cp932').split(',')
db.create_tables([Twitte], True)
api = twitter.Api(base_url="https://api.twitter.com/1.1",
consumer_key=consumer_key,
consumer_secret=consumer_secret,
access_token_key=access_token_key,
access_token_secret=access_token_secret)
for item in GetStreamFilter(api, track=track):
print '---------------------'
if 'text' in item:
print (item['id_str'])
print (dateutil.parser.parse(item['created_at']))
print (item['text'])
print (item['place'])
row = Twitte(createAt=dateutil.parser.parse(item['created_at']),
idStr=item['id_str'],
contents=item['text'])
row.save()
row = None
if __name__ == '__main__':
sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout, errors='backslashreplace')
argvs = sys.argv
argc = len(argvs)
sys.exit(main(argvs, argc))
python twitter_stream.py consumer_key consumer_secret access_token_key access_token_secret #election,#House of Representatives election,election
If you specify the access token information and keywords, a SQLITE database called twitter_stream.sqlite will be created in the current directory.
-The latest code provides the GetStreamFilter method, but it doesn't exist in the version that can be obtained with easy_install in Python 2.7. I'm implementing the same code here.
-Since it is running on Windows here, the track variable is converted with cp932, but please match this with the character code of the terminal.
-It doesn't matter when a large amount of data is flowing, but there is a delay of several minutes in acquiring the data to be filtered last tweeted.
-Basically, it is better to register the database and perform time-consuming processing in a separate process.
-Past data cannot be obtained, so it is necessary to make it a resident process.
-Since created_at is UTC time, it is 9 hours later than Japanese time. The time registered here plus 9 hours will be the time in Japan.
Recommended Posts