I got a tweet with Streaming and tried to export a JSON file, so make a note. When I looked it up, I found that many tweets were obtained using tweepy, so maybe that is the mainstream? So don't use it! I didn't use tweepy because I couldn't use what I made with REST API as it is. By the way, python3.5
Refer to this area https://dev.twitter.com/streaming/reference/post/statuses/filter http://qiita.com/kenmatsu4/items/23768cbe32fe381d54a2 http://qiita.com/yubais/items/dd143fe608ccad8e9f85 http://qiita.com/Gen6/items/ce83a77217a86a26d907
# -*- coding:utf-8 -*-
from requests_oauthlib import OAuth1Session
import json, datetime, time, sys
KEYS = { #List the keys you got with your account below
'consumer_key':'*******',
'consumer_secret':'*******',
'access_token':'*******',
'access_secret':'*******'
}
#Connect to Twitter API
twitter = OAuth1Session(KEYS['consumer_key'],KEYS['consumer_secret'],
KEYS['access_token'],KEYS['access_secret'])
nRstTm = 15 #Rate Limited Wait Time
stKeyWrd = 'twitter' #Search word
url = 'https://stream.twitter.com/1.1/statuses/filter.json'
#Convert tweet datetime to Japan Standard Time
def str_to_datetime_jp(datetimes):
dts = datetime.datetime.strptime(datetimes,'%a %b %d %H:%M:%S +0000 %Y')
return(dts+ datetime.timedelta(hours=9)).strftime("%Y-%m-%d %H:%M:%S")
f = open("teststream.json", "w",encoding='utf-8')
while(True):
try:
req = twitter.post(url,
data= {"track":stKeyWrd},
stream = True)
if req.status_code == 200:
for line in req.iter_lines():
work = json.loads(line.decode('utf-8'))
workuser = work['user']
tweet_json = {"tweetId":workuser["id"],
"name":workuser["name"],
"screen_name":workuser["screen_name"],
"text":work["text"],
"created_at":str_to_datetime_jp(work["created_at"])
}
print('write')
json.dump(tweet_json, f, ensure_ascii=False, indent=2,sort_keys=True)
#If you access too much in a short time, you will get a 420 error, so wait
elif req.status_code == 420:
print('Rate Limited:',nRstTm,'Wait for minutes')
time.sleep(nRstTm)
else:
#For the time being, processing is omitted
print('req.status_code',req.status_code)
break
#If it cannot be acquired by stream, the following exception will occur, so acquire it again.
except json.JSONDecodeError as e:
print('Reacquire')
pass
except KeyboardInterrupt:
print('Processing Exit')
f.close()
break
except:
print('except', sys.exc_info())
pass
Start streaming here, and if the status is 200, loop and write JSON. My level is too low and this is just honest.
The thing that got stuck is req.iter_lines (), which processes one by one.
for line in req.iter_lines():
work = json.loads(line.decode('utf-8'))
In the way of passing data here, the variable line is bytes type, so if you simply json.loads, you will get angry at str type. I don't know that I have to convert it with decode ('utf-8').
Also, if you stream something that has not been tweeted at all, even if you want to json.loads, do not put it in the variable line! So I caught it below.
except json.JSONDecodeError as e:
print('Reacquire')
pass
There are two things I don't understand when I implement it, so I can't cover it well.
Apparently, if some words are attached before and after the word searched by keyword, it can not be acquired. https://github.com/tsupo/Twitter-API-Specification--written-in-Japanese-/blob/master/twitterAPI.txt
For example, you can search for "Aikatsu!" By keyword and use a hashtag (half-width space between words) like "#Aikatsu!", But particles like "Aoi Kiriya is the cutest in Aikatsu!" It doesn't come immediately.
Aikatsu Stars! Honestly, I'm in trouble because it's difficult to get the live commentary on twitter.
Is there any way around this? Confidence Niki Professor Onashisu!
If you try to connect / reconnect many times in a short period of time, you will be told that you are too connected and you will get a 420 error. https://dev.twitter.com/overview/api/response-codes https://dev.twitter.com/rest/public/rate-limiting
It's okay to wait for a reconnection with a 420 error, but I have no idea how many minutes to actually wait. Why isn't streaming explicitly stated when it's REST?
To be honest, this error occurs and tweets flow while waiting for reconnection, so I'd like to avoid this error from being output, but I don't know how to do it. Can't you do it in the first place?
I feel that there is a high possibility that the implementation is bad. I feel that it is bad to attack many times like REST.
Recommended Posts