As you know, when you get tweets with Twitter API, the number of times you get them in a certain period of time and the number of tweets you get at one time are limited.
In order to deal with this limitation well, here are some tips on how to get it continuously. Please note that this post does not cover POST. (Although it is almost the same)
Please be sure to check the primary information for numerical values. The sample code is published on Gist. (It seems that there is a Twitter package, but I do not use it)
The limits are as follows.
GET endpoints
The standard API rate limits described in this table refer to GET (read) endpoints. Note that endpoints not listed in the chart default to 15 requests per allotted user. All request windows are 15 minutes in length. These rate limits apply to the standard API endpoints only, does not apply to premium APIs.
Excerpt from Rate limits — Twitter Developers
There is a limit to the number of acquisitions in 15-minute units.
According to this page, for example, the restrictions for " search / tweets
"are as follows (oh, Standard).
Excerpt from [Rate limits — Twitter Developers](https://developer.twitter.com/en/docs/basics/rate-limits)
Endpoint Resource family Requests / window (user auth) Requests / window (app auth) GET search/tweets search 180 450
You can get the current limit status as Endpoint at https://api.twitter.com/1.1/application/rate_limit_status.json.
The parameter resources
is optional and specifies Resource family
.
user auth (OAuth v1.1)
{
"rate_limit_context": {
"access_token": "*******************"
},
"resources": {
"search": {
"/search/tweets": {
"limit": 180,
"remaining": 180,
"reset": 1591016735
}
}
}
}
app auth (OAuth v2.0)
{
"rate_limit_context": {
"application": "dummykey"
},
"resources": {
"search": {
"/search/tweets": {
"limit": 450,
"remaining": 450,
"reset": 1591016736
}
}
}
}
Each reset
number is epoch time, which indicates the time it will be reset.
In the above example, ʻuser auth (OAuth v1.1) with epoch time
1591016735=
2020-06-01 22:05:35, ʻapp auth
(OAuth v2.0) with 1591016736
= Indicates that it will be reset to 2020-06-01 22:05:36
.
If the limit is violated, the number for remaining
will be 0
.
The items of the acquired information are as follows. (Example of search family)
Category | Family | Endpoint | Key | Value |
---|---|---|---|---|
rate_limit_context | access_token (user auth (v1.1)) | Contents of Access Token | ||
application (app auth (v2.0)) | dummykey (Seems to be fixed) | |||
resources | ||||
search | ||||
/search/tweets | ||||
limit | Maximum number of times within the time limit | |||
remaining | Remaining number of times that can be accessed within the time limit | |||
reset | Time when the time limit is reset(epoch time) |
resources
.The example below is when you mistakenly specify'user' for the Resource Family. (Actually, you should specify'users' (with s))
user auth
{
"rate_limit_context": {
"access_token": "*******************"
}
}
app auth
{
"rate_limit_context": {
"application": "dummykey"
}
}
Both return " rate_limit_context
"but do not have" resources
".
A 429 is returned in res.status_code
(HTTP Status Code) on Rate Limit errors. (420 may be returned [^ 1].)
[^ 1]: Generally, 429 " Too Many Requests: Returned when a request cannot be served due to the app's rate limit having been exhausted for the resource.
"is returned, but very rarely 420" ʻEnhance Your Calm: Returned when an app is being rate limited for making too many requests. `" May be returned. The latter may happen if you accidentally make multiple requests at the same time (unverified). → There was an explanation in "Connecting to a streaming endpoint — Twitter Developers", so I added it to the text. (2020/06/03).
Excerpt from [Response codes — Twitter Developers](https://developer.twitter.com/en/docs/basics/response-codes)
Code Text Description 420 Enhance Your Calm Returned when an app is being rate limited for making too many requests. 429 Too Many Requests Returned when a request cannot be served due to the app's rate limit having been exhausted for the resource. See Rate Limiting.
[Updated on 06/03/2020] There was a detailed explanation of 420 below.
Excerpt from [Connecting to a streaming endpoint — Twitter Developers](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/connecting)
420 Rate Limited The client has connected too frequently. For example, an endpoint returns this status if:
- A client makes too many login attempts in a short period of time.
- Too many copies of an application attempt to authenticate with the same credentials.
88 is entered in the JSON errors.code.
{
"errors": [
{
"code": 88,
"message": "Rate limit exceeded"
}
]
}
Excerpt from [Response codes — Twitter Developers](https://developer.twitter.com/en/docs/basics/response-codes)
Code Text Description 88 Rate limit exceeded Corresponds with HTTP 429. The request limit for this resource has been reached for the current rate limit window.
See each site for exceptions such as requests
.
Is the processing flow considering Rate Limit as follows?
while True:
try:
res =Request to API get/post
res.raise_for_status()
except requests.exceptions.HTTPError:
#429 when the Rate Limit is reached/420 is returned
if res.status_code in (420, 429):
Get Rate Limit information ← Here
Wait quietly until reset time
continue
420/Exception handling other than 429
except OtherException:
Exception handling
Processing when it can be acquired successfully
break or return or yield etc.
The following is a concrete measure for the " Rate Limit information acquisition
"part.
As a sample of information acquisition, there is not much merit to implement it in a class, but considering actually incorporating it in a program, I think it would be better to write it in a form that is easy to modularize, so I made it a class called GetTweetStatus. There is. (There is also a feeling that I want to avoid accessing from the outside as much as possible, such as apikey and Bearer ...)
class GetTweetStatus
def __init__(self, apikey, apisec, access_token="", access_secret=""):
self._apikey = apikey
self._apisec = apisec
self._access_token = access_token
self._access_secret = access_secret
self._access_token_mask = re.compile(r'(?P<access_token>"access_token":)\s*".*"')
The last line, re.compile (), is for masking the display of the received ʻaccess_token`.
user auth (OAuth v1.1)
GetTweetStatus.get_limit_status_v1( )
def get_limit_status_v1(self, resource_family="search"):
"""OAuth v1.Get status using 1"""
#Use OAuth1Session because OAuth is complicated
oauth1 = OAuth1Session(self._apikey, self._apisec, self._access_token, self._access_secret)
params = {
'resources': resource_family # help, users, search, statuses etc.
}
try:
res = oauth1.get(STATUS_ENDPOINT, params=params, timeout=5.0)
res.raise_for_status()
except (TimeoutError, requests.ConnectionError):
raise requests.ConnectionError("Cannot get Limit Status")
except Exception:
raise Exception("Cannot get Limit Status")
return res.json()
. You need to install
requests_oauthlib and
from requests_oauthlib import OAuth1Session`.app auth (OAuth v2.0)
GetTweetStatus.get_limit_status_v2( )
def get_limit_status_v2(self, resource_family="search"):
"""OAuth v2.0 (Bearer)Get status using"""
bearer = self._get_bearer() #Get Bearer
headers = {
'Authorization':'Bearer {}'.format(bearer),
'User-Agent': USER_AGENT
}
params = {
'resources': resource_family # help, users, search, statuses etc.
}
try:
res = requests.get(STATUS_ENDPOINT, headers=headers, params=params, timeout=5.0)
res.raise_for_status()
except (TimeoutError, requests.ConnectionError):
raise requests.ConnectionError("Cannot get Limit Status")
except Exception:
raise Exception("Cannot get Limit Status")
return res.json()
requests
, so you need to install requests
and ʻimport requests`.bearer = self._get_bearer () This is the
_get_bearer () part called by #Get Bearer
.
GetTweetStatus._get_bearer( ), _get_credential( )
def _get_bearer(self):
"""Get Bearer"""
cred = self._get_credential()
headers = {
'Authorization': 'Basic ' + cred,
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
'User-Agent': USER_AGENT
}
data = {
'grant_type': 'client_credentials',
}
try:
res = requests.post(TOKEN_ENDPOINT, data=data, headers=headers, timeout=5.0)
res.raise_for_status()
except (TimeoutError, requests.ConnectionError):
raise Exception("Cannot get Bearer")
except requests.exceptions.HTTPError:
if res.status_code == 403:
raise requests.exceptions.HTTPError("Auth Error")
raise requests.exceptions.HTTPError("Other Exception")
except Exception:
raise Exception("Cannot get Bearer")
rjson = res.json()
return rjson['access_token']
def _get_credential(self):
"""Credential generation"""
pair = self._apikey + ':' + self._apisec
bcred = b64encode(pair.encode('utf-8'))
return bcred.decode()
grant_type =" client_credentials "
Implemented as a method. In actual use, is it implemented in something that returns "reset"?
GetTweetStatus.disp_limit_status( )
def disp_limit_status(self, version=2, resource_family="search"):
"""Display Rate Limit by version"""
if version == 2:
resj = self.get_limit_status_v2(resource_family=resource_family)
elif version == 1:
resj = self.get_limit_status_v1(resource_family=resource_family)
else:
raise Exception("Version error: {version}")
#JSON display
print(self._access_token_mask.sub(r'\g<access_token> "*******************"',
json.dumps(resj, indent=2, ensure_ascii=False)))
#Disassembled display(remain/Example of getting reset)
print("resources:")
if 'resources' in resj:
resources = resj['resources']
for family in resources:
print(f" family: {family}")
endpoints = resources[family]
for endpoint in endpoints:
items = endpoints[endpoint]
print(f" endpoint: {endpoint}")
limit = items['limit']
remaining = items['remaining']
reset = items['reset']
e2d = epoch2datetime(reset)
duration = get_delta(reset)
print(f" limit: {limit}")
print(f" remaining: {remaining}")
print(f" reset: {reset}") #← Actually a form that returns this
print(f" reset(epoch2datetime): {e2d}")
print(f" duration: {duration} sec")
else:
print(" Not Available")
remaining
or reset
to handle it.The time manipulation utility and the beginning of the file.
getTwitterStatus.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Twitter Rate Limit Information acquisition sample"""
import os
import sys
import json
from base64 import b64encode
import datetime
import time
import re
import argparse
#!pip install requests
import requests
#!pip install requests_oauthlib
from requests_oauthlib import OAuth1Session
USER_AGENT = "Get Twitter Staus Application/1.0"
TOKEN_ENDPOINT = 'https://api.twitter.com/oauth2/token'
STATUS_ENDPOINT = 'https://api.twitter.com/1.1/application/rate_limit_status.json'
def epoch2datetime(epoch):
"""Epoch time(UNIX time)Datetime(localtime)Convert to"""
return datetime.datetime(*(time.localtime(epoch)[:6]))
def datetime2epoch(d_utc):
"""datetime (UTC)Epoch time(UNIX time)Convert to"""
#Convert UTC to localtime
date_localtime = \
d_utc.replace(tzinfo=datetime.tzinfo.tz.tzutc()).astimezone(datetime.tzinfo.tz.tzlocal())
return int(time.mktime(date_localtime.timetuple()))
def get_delta(target_epoch_time):
"""target_epoch_Returns the difference between time and the current time"""
return target_epoch_time - int(round(time.time(), 0))
Since it's a big deal, I tried to make it possible to specify the OAuth version and Resource Family with command line arguments.
main( )
def main():
"""main()"""
# API_KEY, API_Confirmation of environment variables such as SEC
apikey = os.getenv('API_KEY', default="")
apisec = os.getenv('API_SEC', default="")
access_token = os.getenv('ACCESS_TOKEN', default="")
access_secret = os.getenv('ACCESS_SECRET', default="")
if apikey == "" or apisec == "": #If the environment variable cannot be obtained
print("Environment variable API_KEY and API_Please set SEC.", file=sys.stderr)
print("OAuth v1.If you use 1, the environment variable ACCESS_TOKEN and ACCESS_Also set SECRET.",
file=sys.stderr)
sys.exit(255)
#Argument setting
parser = argparse.ArgumentParser()
parser.add_argument('-a', '--oauthversion', type=int, default=0,
metavar='N', choices=(0, 1, 2),
help=u'OAuth version specification[1|2]')
parser.add_argument('-f', '--family', type=str, default='search',
metavar='Family',
help=u'API family specification. Separated by commas for multiple')
args = parser.parse_args()
oauthversion = args.oauthversion
family = args.family
#GetTweetStatus Object Get
gts = GetTweetStatus(apikey, apisec, access_token=access_token, access_secret=access_secret)
# User Auth (OAuth v1.1)Rate Limit acquisition and display by
if (oauthversion in (0, 1)) and (access_token != "" and access_secret != ""):
print("<<user auth (OAuth v1)>>")
gts.disp_limit_status(version=1, resource_family=family)
# App Auth (OAuth v2.0)Rate Limit acquisition and display by
if oauthversion in (0, 2):
print("<<app auth (OAuth v2)>>")
gts.disp_limit_status(version=2, resource_family=family)
if __name__ == "__main__":
main()
getTwitterStatus.py
[^ 2]: I tried using Gist for the first time. I'm worried whether the usage is correct.
$ python3 getTwitterStatus.py
<<user auth (OAuth v1)>>
{
"rate_limit_context": {
"access_token": "*******************"
},
"resources": {
"search": {
"/search/tweets": {
"limit": 180,
"remaining": 180,
"reset": 1591016735
}
}
}
}
resources:
family: search
endpoint: /search/tweets
limit: 180
remaining: 180
reset: 1591016735
reset(epoch2datetime): 2020-06-01 22:05:35
duration: 899 sec
<<app auth (OAuth v2)>>
{
"rate_limit_context": {
"application": "dummykey"
},
"resources": {
"search": {
"/search/tweets": {
"limit": 450,
"remaining": 450,
"reset": 1591016736
}
}
}
}
resources:
family: search
endpoint: /search/tweets
limit: 450
remaining: 450
reset: 1591016736
reset(epoch2datetime): 2020-06-01 22:05:36
duration: 900 sec
$
There is a limit to the number of times you can get it at one time, regardless of the time limit.
200 ($ count \ leq200 search / tweets
.
There are various other restrictions, but in the case of search / tweets
, the item next_results
will be included in search_metadata
so that it can be retrieved continuously.
{
"statuses": [
...
],
"search_metadata": {
"completed_in": 0.047,
"max_id": 1125490788736032770,
"max_id_str": "1125490788736032770",
"next_results": "?max_id=1124690280777699327&q=from%3Atwitterdev&count=2&include_entities=1&result_type=mixed",
"query": "from%3Atwitterdev",
"refresh_url": "?since_id=1125490788736032770&q=from%3Atwitterdev&result_type=mixed&include_entities=1",
"count": 2,
"since_id": 0,
"since_id_str": "0"
}
}
There is next_results
in search_metadata
, so if you request this as a new parameter, you will also get the rest of the search results (in the units specified in count).
As long as you don't hit the time limit, you can refer to this and repeat to get the results continuously. That is, you can get $ count $ (up to 100) $ × limit $ (180 for user auth) $ = 18,000 Tweet $ in Rate Limit.
In the case of the above sample, $ count = 2 $, so if you continue as it is, you can get $ count (2) tweets / time x limit (180) times / 15 minutes = 360 tweets / 15 minutes $, and you will be limited. You will reach it (if you request it, of course).
When all the search results have been retrieved, next_results
disappears from search_metadata
.
In addition, sometimes, if you reacquire it, next_results
may be restored, so you may want to wait for a while and try again.
In the case of statuses / user_timeline
etc., * _metadata
is not included, so make good use of the specification of max_id
and generate something equivalent to next_results
of search by yourself. is needed. (Actually, I haven't used it for anything other than search, so I'm not sure, but I think it's not that far off.)
In the case of search, the past 7 days are targeted, but since ʻuser_timeline` is the past 24 hours, I think that the purpose is different in the first place ...
Recommended Posts