Get media timeline images and videos with Python + Tweepy

Overview

I wanted to get the media timeline (the column where I pressed "media" on the user page), so I write the article as a reminder.

What to use

(By the way, the first Python)

0. Preparations

I referred to other people's articles for the tokens and Python preparation required for Twitter API connection (round throw)

1. First try using Tweepy

Refer to this article, try it on your timeline.

If you print the text of the obtained tweet, you can see the result.

キャプチャ.PNG

You can get the tweet. The image was also saved in the img folder of the source code folder. However, RT tweets are also mixed. When you look at the media on Twitter, you shouldn't see what the person RTs, and this time I want to exclude RTs as well.

2. Set include_rts = False to exclude RT

If you look at Official document, API specification of user_timeline, you can see that the API param is ** There is an exact parameter called include_rts **, set this to ** False **. In the case of tweepy, it seems that it can be set by writing as follows

tweetpytest.py


search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False).items(count_no)

This will exclude RT from the results. As a caveat, the count parameter does not match the number of tweets that were actually acquired, but looking at the official document again, it seems that after acquiring the timeline including RT, tweets are selected based on the param condition.

3. Analyze the return result of tweepy

After that, the return value of the returned tweepy is turned around with for in to get the image or video URL and throw it to the save method. The procedure is as follows

1. Confirm the existence of **'extended_entities' **, otherwise it is not a media tweet, so it is not applicable

It seems that the return value of tweepy is ItemIterator type and the child element is Status type (confirmed by debugging). If it is a list type, it can be judged by list.get ('element name') etc., but this time it is a property of the class, so this method cannot be used. This time, it is judged by ** hasattr method **.

tweetpytest.py


if hasattr(result, 'extended_entities'):

Without this, it is not a media tweet, so ignore it and go to the next line.

2. Check if there is'video_info' in result ['extended_entities'] ['media'] [0], if it is a video or GIF, if not, confirm the image

Image tweets do not have video_info, so you can judge by this.

3. GIF if result ['extended_entities'] ['media'] [0] ['type'] = animated_gif. Save [0] as mp4

GIF is also saved as mp4 for the time being.

4. Loop result ['extended_entities'] ['media'] [0] ['video_info'] ['variants'], save mp4 with the largest binrate

tweetpytest.py


                bitrate_array = []
                for movie in ex_media_video_variants:
                    bitrate_array.append(movie.get('bitrate',0))
                max_index = bitrate_array.index(max(bitrate_array))
                movie_url = ex_media_video_variants[max_index]['url']

Elements are stored in different sizes and formats, and it seems that there is no guarantee as to what is in which index. If it is fixed, the size may be small, or you may get a video URL whose format is not mp4 (there was). Therefore, analyze the contents with a for statement so that you can drop the mp4 video with the largest binrate.

4. Source

Putting the above together, the source is as follows.

tweetpytest.py


# coding: UTF-8
#!/usr/bin/python3

import json, config #Standard json module and config.Loading py
from requests_oauthlib import OAuth1Session #Loading OAuth library
import tweepy
import time
import datetime
import urllib.error
import urllib.request
import re
import sys, calendar
import update_tweetinfo_csv

CONSUMER_KEY = config.CONSUMER_KEY
CONSUMER_SECRET = config.CONSUMER_SECRET
ACCESS_TOKEN = config.ACCESS_TOKEN
ACCESS_SECRET = config.ACCESS_TOKEN_SECRET

FOLDER_PASS = 'img/'

#Authentication
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth)

def download_file(url, file_name):
   urllib.request.urlretrieve(url, FOLDER_PASS + file_name)

key_account = input('Enter account name:')
count_no = int(input('Set search count:'))
search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False).items(count_no)

for result in search_results:
    if hasattr(result, 'extended_entities'): #result is'extended_entities'Determine if you have an attribute
        ex_media = result.extended_entities['media']
        tweet_id = result.id
        if 'video_info' in ex_media[0]:
            ex_media_video_variants = ex_media[0]['video_info']['variants']
            media_name = '%s-%s.mp4' % (key_account, tweet_id)
            if 'animated_gif' == ex_media[0]['type']:
                #GIF file
                gif_url = ex_media_video_variants[0]['url']
                download_file(gif_url, media_name)
            else:
                #Video file
                bitrate_array = []
                for movie in ex_media_video_variants:
                    bitrate_array.append(movie.get('bitrate',0))
                max_index = bitrate_array.index(max(bitrate_array))
                movie_url = ex_media_video_variants[max_index]['url']
                download_file(movie_url, media_name)
        else:
            #Image file
            for image in ex_media:
                image_url = image['media_url']
                image_name = image_url.split("/")[len(image_url.split("/"))-1]
                download_file(image_url + ':orig', image_name)

print('End')

Execution image

An img folder is created under the folder with the source, and images and videos are saved under it. For the time being, the purpose of this time is achieved. キャプチャ.PNG

5. (Bonus) About since_id

For the time being, it is completed above, but with this, all tweets will be searched every time. If you want to search only tweets newer than this time when you start it next time, set the following param.

search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False, since_id='Last last tweet ID').items(count_no)

This will add the condition "since_id <search target ID".

At the end of the previous source, print the last id printsearch_results = tweepy.Cursor (api.user_timeline, screen_name = key_account, include_rts = False, since_id = pre_last_tweet_id-1) .items (count_no)

If you keep the last tweet ID at the end of the loop and save it in a text file or something in association with the user ID, it seems that the total number of searches can be reduced.

6. Digression

At first, I used it as a source to hit the API without using tweepy, but I was caught in the limit that I can only get up to 200 cases with one request. I tried to form a loop to access while getting since_id and max_id, but since tweepy can be taken normally, I changed to a source that uses tweepy on the way.

Based on this source, I plan to build an application that saves images a little more conveniently.

Reference URL

I tried to get Twitter images "batch" with python Official documentation, API specifications for user_timeline

Recommended Posts

Get media timeline images and videos with Python + Tweepy
Get Twitter timeline with python
Capturing images with Pupil, python and OpenCV
Importing and exporting GeoTiff images with Python
Get images of OpenStreetMap and Geographical Survey Institute maps with Python + py-staticmaps
Get images of OpenStreetMap and Geographical Survey Institute maps with Python + staticmap
Automatically search and download YouTube videos with Python
Get git branch name and tag name with python
Get Gmail subject and body with Python and Gmail API
Get Tweets with Tweepy
Get date with python
Get comments on youtube Live with [python] and [pytchat]!
Get mail from Gmail and label it with Python3
[Python] Get user information and article information with Qiita API
Get country code with python
Programming with Python and Tkinter
Encryption and decryption with Python
Bordering images with python Part 1
Python and hardware-Using RS232C with Python-
Get Youtube data with python
Get thread ID with python
Get started with Python! ~ ② Grammar ~
python with pyenv and venv
[python] Get quotient and remainder
Get stock price with Python
Get home directory with python
Get keyboard events with python
Get Alembic information with Python
Works with Python and R
Compare HTTP GET / POST with cURL (command) and Python (programming)
LGTM --Compose LGTM images with videos and photos and output GIF animation
Send experiment results (text and images) to slack with Python
Get rid of dirty data with Python and regular expressions
Sample of HTTP GET and JSON parsing with python of pepper
[Python x Zapier] Get alert information and notify with Slack
Get additional data to LDAP with python (Writer and Reader)
Communicate with FX-5204PS with Python and PyUSB
Number recognition in images with Python
Shining life with Python and OpenCV
Get started with Python! ~ ① Environment construction ~
Robot running with Arduino and python
Install Python 2.7.9 and Python 3.4.x with pip.
Neural network with OpenCV 3 and Python 3
AM modulation and demodulation with python
Link to get started with python
[Python] font family and font with matplotlib
Scraping with Node, Ruby and Python
Get reviews with python googlemap api
Scraping with Python, Selenium and Chromedriver
Scraping with Python and Beautiful Soup
Get the weather with Python requests
Get web screen capture with python
Get the weather with Python requests 2
[Python] Get economic data with DataReader
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
JSON encoding and decoding with python
How to get started with Python
Hadoop introduction and MapReduce with Python
[GUI with Python] PyQt5-Drag and drop-
Post multiple Twitter images with python
[Small story] Get timestamp with Python