It will be spoiled first, but it can not be filtered when acquiring with API except RT. (At least I) I will introduce how to exclude the acquisition result locally.
I want to get tweets with only comments with tweepy. A tweet with only a simple comment here is a tweet that does not include a URL.
import tweepy
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
api = tweepy.API(auth)
result=api.user_timeline(screen_name="screen_name",count=n)
for result in results:
print(result.entites)
print(result.text)
For example, if you get a quote RT like this tweet, result.text will be
Quote RT https://t.co/ksg6oW95Wo
The URL of the citation source is included as in. Similarly, tweets that include media such as images and videos also get the URL to that media.
if(result.entities["urls"]!=[] or ("media"or"is_quote_status") in result.entities):
The ʻentities` of the above quote RT is as follows.
{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/ksg6oW95Wo', 'expanded_url': 'https://twitter.com/T3ahat/status/1242458821550690304', 'display_url': 'twitter.com/T3ahat/status/…',
'indices': [5, 28]}]}
As you can see, ʻurls contains the URL (quoted), so tweets containing quoted RTs and URLs You can judge whether __result.entities ["urls "]` is [] or __.
Also, for quote RT, result.is_quote_status is True, so
You can also judge whether __result.is_quote_status is True or __.
An example of RT's ʻentities` is shown below.
{'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'T3ahat', 'name': 'teahat', 'id': 890647790048509952, 'id_str': '890647790048509952', 'indices': [3, 10]}], 'urls': []}
Thus, in RT, ʻurlsis []. However,result.textreturns the text of the body to be RT, so __ Add-filter: hoge` to the search word __.
This is efficient because you can exclude it when you hit the API, so you don't have to get extra tweets.
Such The ʻentities` of tweets including image media are as follows.
{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1242466345960144898, 'id_str': '1242466345960144898', 'indices': [6, 29], 'media_url': 'http://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'media_url_https': 'https://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'url': 'https://t.co/4IZF0jmTZy', 'display_url': 'pic.twitter.com/4IZF0jmTZy', 'expanded_url': 'https://twitter.com/T3ahat/status/1242466350351540225/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 400, 'h': 400, 'resize': 'fit'}, 'medium': {'w': 400, 'h': 400, 'resize': 'fit'}, 'small': {'w': 400, 'h': 400, 'resize': 'fit'}}}]}
In this way, ʻurlsis [] for tweets that include media. However,result.textcontains the URL of the image. What is noteworthy here is thatmedia is added to ʻentities. Tweets that do not include media do not include the key media in the first place, so tweets that include media
You can judge whether __ "media" in result.entities is True or __.
With URL, quote RT: result.entites ["urls"]! = [] Or result.is_quote_status RT: -filter: Exclude with hoge __ Media: __ "media" in result.entities
Recommended Posts