Exclude tweets containing URLs with tweepy [Python]

It will be spoiled first, but it can not be filtered when acquiring with API except RT. (At least I) I will introduce how to exclude the acquisition result locally.

I want to get tweets with only comments with tweepy. A tweet with only a simple comment here is a tweet that does not include a URL.

Sample code

import tweepy

auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)  
api = tweepy.API(auth)
result=api.user_timeline(screen_name="screen_name",count=n)
for result in results:
    print(result.entites)
    print(result.text)

For example, if you get a quote RT like this tweet, result.text will be

Quote RT https://t.co/ksg6oW95Wo

The URL of the citation source is included as in. Similarly, tweets that include media such as images and videos also get the URL to that media.

Solution


if(result.entities["urls"]!=[] or ("media"or"is_quote_status") in result.entities):

Commentary

・ Tweet with URL and quote RT

The ʻentities` of the above quote RT is as follows.

{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/ksg6oW95Wo', 'expanded_url': 'https://twitter.com/T3ahat/status/1242458821550690304', 'display_url': 'twitter.com/T3ahat/status/…', 
'indices': [5, 28]}]}

As you can see, ʻurls contains the URL (quoted), so tweets containing quoted RTs and URLs You can judge whether __result.entities ["urls "]` is [] or __.

Also, for quote RT, result.is_quote_status is True, so You can also judge whether __result.is_quote_status is True or __.

・ Retweet

An example of RT's ʻentities` is shown below.

{'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'T3ahat', 'name': 'teahat', 'id': 890647790048509952, 'id_str': '890647790048509952', 'indices': [3, 10]}], 'urls': []}

Thus, in RT, ʻurlsis []. However,result.textreturns the text of the body to be RT, so __ Add-filter: hoge` to the search word __. This is efficient because you can exclude it when you hit the API, so you don't have to get extra tweets.

・ Tweets including media

Such The ʻentities` of tweets including image media are as follows.

{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1242466345960144898, 'id_str': '1242466345960144898', 'indices': [6, 29], 'media_url': 'http://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'media_url_https': 'https://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'url': 'https://t.co/4IZF0jmTZy', 'display_url': 'pic.twitter.com/4IZF0jmTZy', 'expanded_url': 'https://twitter.com/T3ahat/status/1242466350351540225/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 400, 'h': 400, 'resize': 'fit'}, 'medium': {'w': 400, 'h': 400, 'resize': 'fit'}, 'small': {'w': 400, 'h': 400, 'resize': 'fit'}}}]}

In this way, ʻurlsis [] for tweets that include media. However,result.textcontains the URL of the image. What is noteworthy here is thatmedia is added to ʻentities. Tweets that do not include media do not include the key media in the first place, so tweets that include media You can judge whether __ "media" in result.entities is True or __.

Summary

With URL, quote RT: result.entites ["urls"]! = [] Or result.is_quote_status RT: -filter: Exclude with hoge __ Media: __ "media" in result.entities

Recommended Posts

Exclude tweets containing URLs with tweepy [Python]
Get tweets containing keywords using Python Tweepy
Collecting tweets with Python
Posting tweets with python
Get Tweets with Tweepy
Search twitter tweets with python
Exclude specified cases with Python unittest
Tweet from python with Twitter Developer + Tweepy
Get replies to specific tweets with tweepy
Get lots of your tweets with Tweepy
How to selectively delete past tweets with Tweepy
FizzBuzz with Python3
Scraping with Python
Statistics with python
Scraping with Python
Python with Go
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Bingo with python
Zundokokiyoshi with python
Bulk download images from specific URLs with python
Excel with Python
Microcomputer with Python
Cast with python
Get media timeline images and videos with Python + Tweepy
Bulk download images from specific site URLs with python
Make ordinary tweets fleet-like with AWS Lambda and Python
Serial communication with Python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Socket communication with Python
Data analysis with python 2
Scraping with Python (preparation)
Try scraping with Python.
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Run Python with VBA
Handling yaml with python
Solve AtCoder 167 with python
Serial communication with python
[Python] Use JSON with Python
Learning Python with ChemTHEATER 05-1
Learn Python with ChemTHEATER
Run prepDE.py with python3
1.1 Getting Started with Python
Binarization with OpenCV / Python
3. 3. AI programming with Python
Kernel Method with Python
Non-blocking with Python + uWSGI
Scraping with Python + PhantomJS
Drive WebDriver with python
Use mecab with Python3
[Python] Redirect with CGIHTTPServer
Voice analysis with python
Think yaml with python