[PYTHON] Export the contents acquired by Twitter Streaming API in JSON format

I got a tweet with Streaming and tried to export a JSON file, so make a note. When I looked it up, I found that many tweets were obtained using tweepy, so maybe that is the mainstream? So don't use it! I didn't use tweepy because I couldn't use what I made with REST API as it is. By the way, python3.5

Refer to this area https://dev.twitter.com/streaming/reference/post/statuses/filter http://qiita.com/kenmatsu4/items/23768cbe32fe381d54a2 http://qiita.com/yubais/items/dd143fe608ccad8e9f85 http://qiita.com/Gen6/items/ce83a77217a86a26d907

Source

# -*- coding:utf-8 -*-

from requests_oauthlib import OAuth1Session
import json, datetime, time, sys

KEYS = { #List the keys you got with your account below
    'consumer_key':'*******',
    'consumer_secret':'*******',
    'access_token':'*******',
    'access_secret':'*******'
   }

#Connect to Twitter API
twitter = OAuth1Session(KEYS['consumer_key'],KEYS['consumer_secret'],
                        KEYS['access_token'],KEYS['access_secret'])

nRstTm = 15 #Rate Limited Wait Time
stKeyWrd = 'twitter' #Search word

url = 'https://stream.twitter.com/1.1/statuses/filter.json'

#Convert tweet datetime to Japan Standard Time
def str_to_datetime_jp(datetimes):
    dts = datetime.datetime.strptime(datetimes,'%a %b %d %H:%M:%S +0000 %Y')
    return(dts+ datetime.timedelta(hours=9)).strftime("%Y-%m-%d %H:%M:%S")

f = open("teststream.json", "w",encoding='utf-8')
while(True): 
    try:
        req = twitter.post(url,
                    data= {"track":stKeyWrd},
                    stream = True)

        if req.status_code == 200: 
            for line in req.iter_lines():
                work = json.loads(line.decode('utf-8'))
                workuser = work['user']
                tweet_json = {"tweetId":workuser["id"],
                              "name":workuser["name"],
                              "screen_name":workuser["screen_name"],
                              "text":work["text"],
                              "created_at":str_to_datetime_jp(work["created_at"])
                              } 
                print('write')
                json.dump(tweet_json, f, ensure_ascii=False, indent=2,sort_keys=True)

        #If you access too much in a short time, you will get a 420 error, so wait
        elif req.status_code == 420:
            print('Rate Limited：',nRstTm,'Wait for minutes')
            time.sleep(nRstTm)
        else:
            #For the time being, processing is omitted
            print('req.status_code',req.status_code)
            break

    #If it cannot be acquired by stream, the following exception will occur, so acquire it again.
    except json.JSONDecodeError as e:
        print('Reacquire')
        pass
    except KeyboardInterrupt:
        print('Processing Exit')
        f.close()
        break
    except:
        print('except', sys.exc_info())
        pass

Where it is jammed with processing

Start streaming here, and if the status is 200, loop and write JSON. My level is too low and this is just honest.

The thing that got stuck is req.iter_lines (), which processes one by one.

            for line in req.iter_lines():
                work = json.loads(line.decode('utf-8'))

In the way of passing data here, the variable line is bytes type, so if you simply json.loads, you will get angry at str type. I don't know that I have to convert it with decode ('utf-8').

Also, if you stream something that has not been tweeted at all, even if you want to json.loads, do not put it in the variable line! So I caught it below.

    except json.JSONDecodeError as e:
        print('Reacquire')
        pass

What I don't understand

There are two things I don't understand when I implement it, so I can't cover it well.

I can't get words containing particles by keyword search
420 errors occur

I can't get words containing particles in keyword search

Apparently, if some words are attached before and after the word searched by keyword, it can not be acquired. https://github.com/tsupo/Twitter-API-Specification--written-in-Japanese-/blob/master/twitterAPI.txt

For example, you can search for "Aikatsu!" By keyword and use a hashtag (half-width space between words) like "#Aikatsu!", But particles like "Aoi Kiriya is the cutest in Aikatsu!" It doesn't come immediately.

Aikatsu Stars! Honestly, I'm in trouble because it's difficult to get the live commentary on twitter.

Is there any way around this? Confidence Niki Professor Onashisu!

420 error spree

If you try to connect / reconnect many times in a short period of time, you will be told that you are too connected and you will get a 420 error. https://dev.twitter.com/overview/api/response-codes https://dev.twitter.com/rest/public/rate-limiting

It's okay to wait for a reconnection with a 420 error, but I have no idea how many minutes to actually wait. Why isn't streaming explicitly stated when it's REST?

To be honest, this error occurs and tweets flow while waiting for reconnection, so I'd like to avoid this error from being output, but I don't know how to do it. Can't you do it in the first place?

I feel that there is a high possibility that the implementation is bad. I feel that it is bad to attack many times like REST.