[PYTHON] I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once

Introduction

I thought it would be great if there was a program that could trace the user's tweets on Twitter and save the posted images at once. This program uses Python 2.7, which I am currently studying. I think it will work if you have a Python2 execution environment.

specification

Basically, the API key of your Twitter account, the id of the account of the user you want to trace back the image, and the number of tweets to trace back are specified in the program and executed. It seems that the maximum number of tweets that can be traced back from one user is up to the latest 3200 tweets due to API restrictions. It also supports multiple image posting tweets, but GIFs, videos, etc. are not saved. If you are targeting a user with a keyed account, you need the API of the account that is following that user.

code

imgcrawler_twi.py


#coding: UTF-8
from requests_oauthlib import OAuth1Session
import json
import twitkey
import requests
import sys, urllib
import os.path
import time

twitter = OAuth1Session(twitkey.twkey["CONSUMER_KEY"],
                        twitkey.twkey["CONSUMER_SECRET"],
                        twitkey.twkey["ACCESS_TOKEN"],
                        twitkey.twkey["ACCESS_TOKEN_SECRET"]
                        )
#Separate file twitkey.Refer to each required parameter value from py.

Get_Count = 17 #Get_Specify the number of tweets to be acquired in Count
Get_At_Once = 200 #Get_At_Specify the number of tweets that can be traced back once in one acquisition
User_Id = "" #User_Specify the id of the user who wants to trace the image back to Id Example:github
Path = "" #Specify the file path of the directory where you want to save the image in Path Example:./Images/

for i in range(1,Get_Count):
    if(i==1):
        params = {"count":Get_At_Once}
    else:
        params = {"count":Get_At_Once,"max_id":num}
    req = twitter.get("https://api.twitter.com/"
                      "1.1/statuses/user_timeline.json"
                      "?screen_name=%s&include_rts=false" % User_Id,
                      params=params)
    timeline = json.loads(req.text)
    if(req.status_code == 200):
        if(i==1):
            counter=1
        else:
            counter=count
        for tweet in timeline:
            print counter
            print tweet["text"]
            num = tweet["id"]
            counter=counter+1
            if("extended_entities" in tweet.keys()):
                if("media" in tweet["extended_entities"].keys()):
                    print len(tweet["extended_entities"]["media"])
                    for i in range(0,len(tweet["extended_entities"]["media"])):
                        if("type" in tweet["extended_entities"]["media"][i].keys()):
                            if(tweet["extended_entities"]["media"][i]["type"]=="photo"):
                                print tweet["text"]
                                url = tweet["extended_entities"]["media"][i]["media_url_https"]
                                img = urllib.urlopen(url)
                                Name = tweet["user"]["name"]
                                created_at = tweet["created_at"]
                                Month = created_at[4:7]
                                Date = created_at[8:10]
                                Hour = created_at[11:13]
                                Minute = created_at[14:16]
                                Second = created_at[17:19]
                                Year = created_at[26:]
                                img_name = Name+"_"+Year+"_"+Month+"_"+Date+"_"+Hour+"_"+Minute+"_"+Second
                                localfile = open(Path + img_name +"_"+str(i)+".jpg ", 'wb')
                                localfile.write(img.read())
                                img.close()
                                localfile.close()
            else:
                print "No Image"
        count=counter
    else:
        print (req.status_code)
        time.sleep(240)
        #Error handling

twitkey.py


#coding: UTF-8
twkey = {
    "CONSUMER_KEY": "",
    "CONSUMER_SECRET": "",
    "ACCESS_TOKEN": "",
    "ACCESS_TOKEN_SECRET": ""
}

#Please enter each parameter

Code description

About twitkey.py

In twitkey.py, enter the API key of your Twitter account. The account used here can be either abandoned or genuine. There are four parameter values required, but they are easy to see.

The four values required this time are CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, and ACCESS_TOKEN_SECRET.

For confirmation of API key, this page will be helpful. http://phiary.me/twitter-api-key-get-how-to/

About imgcrawler_twi.py

There are four parameter values in the program. Specify the required values for each.

Get_Count =  
Get_At_Once =  
User_Id = "" 
Path = "" 

About "Get_Count" and "Get_At_Once"

Use "Get_Count" and "Get_At_Once" to specify the number of tweets to go back. Specifically, it is like performing the operation of going back to the "Get_At_Once" tweet "Get_Count-1" times. The limit on the number of tweets that can be traced back at one time is 200. Therefore, the maximum value of "Get_At_Once" is 200. Also, as per the specification, there is a limit that you can only go back to the latest 3200 tweets per user, so please make sure that "Get_Count-1" x "Get_At_Once" is 3200 or less. If you want to go back a lot, we recommend that you specify 17 for "Get_Count" and 200 for "Get_At_Once".

About "User_Id"

For "User_Id", specify the id of the user who wants to trace the image. For example, it looks like "github".

About "Path"

For "Path", specify the file path of the folder where you want to save the image. Of course, you can save it anywhere, but for example, create a folder called "Images" in the same directory as the code, and use "./Images/" as the image.

About the saved name of the image

"User's account name" _ "Tweet date and time (year, month, day, hour, minute, second)" _ "0 to 3 (This is the numbering when dealing with tweets with multiple images attached)".

Since the tweet date and time are obtained in London time, there is a time difference of 9 hours.

Run

Save twitkey .py and imgcrawler_twi.py in the same directory and run imgcrawler_twi.py. It is successful if the image is successfully saved in the folder you specified.

スクリーンショット 2017-03-22 20.19.29.png

Other

Put this module in when running! You may get angry. In that case, please add modules such as "requests" and "requests-oauthlib" with pip and then execute it.

I also posted this crawler repository on github. Please take a look! !! https://github.com/tyokuyoku/Twitter_Images_Crawler

Recommended Posts

I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
I made a twitter app that identifies and saves the image of a specific character on the twitter timeline by pytorch transfer learning
I tried to make a bot that randomly acquires Wikipedia articles and tweets once a day
How to make a Raspberry Pi that speaks the tweets of the specified user
I tried to make a site that makes it easy to see the update information of Azure
[Python] I tried to make a simple program that works on the command line using argparse.
[Twitter] I want to make the downloaded past tweets (of my account) into a beautiful CSV
[To Twitter gentlemen] I wrote a script to convert .jpg-large to .jpg at once.
I tried to make a thumbnail image of the best avoidance flag-chan! With RGB values ​​[Histogram] [Visualization]
Get a lot of Twitter tweets at once
The story of IPv6 address that I want to keep at a minimum
I want to make a music player and file music at the same time
I tried to make a translation BOT that works on Discord using googletrans
I tried to make a simple image recognition API with Fast API and Tensorflow
Use python's pixivpy to download all the works of a specific user from pixiv at once (including moving)
[LPIC 101] I tried to summarize the command options that are easy to make a mistake
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I tried to make the phone ring when it was posted at the IoT post
A script that transfers tweets containing specific Twitter keywords to Slack in real time
I tried to make it easy to change the setting of authenticated Proxy on Jupyter
To extract the data of a specific column in a specific sheet in multiple Excel files at once and put the data in each column in one row
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I made a tool to get the answer links of OpenAI Gym all at once
I made a program to collect images in tweets that I liked on twitter with Python
I tried to create a Python script to get the value of a cell in Microsoft Excel
A Python script that goes from Google search to saving the Search results page at once
How to insert a specific process at the start and end of spider with scrapy
I tried to find the entropy of the image with python
A Python script that saves a clipboard (GTK) image to a file.
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I wrote a script that splits the image in two
I will publish a shell script created to reduce the trouble of creating LiveUSB on Linux
I made a Line bot that guesses the gender and age of a person from an image
The story of making a tool that runs on Mac and Windows at the game development site
I tried to build a SATA software RAID configuration that boots the OS on Ubuntu Server
I tried to create a server environment that runs on Windows 10
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to build the SD boot image of LicheePi Nano
I tried to make a regular expression of "date" using Python
I analyzed the tweets about the new coronavirus posted on Twitter Part 2
I tried to register a station on the IoT platform "Rimotte"
I tried to cut out a still image from the video
I tried to make a mechanism of exclusive control with Go
[Python] Create a script that uses FeedParser and LINE Notify to notify LINE of the latest information on the new coronavirus of the Ministry of Health, Labor and Welfare.
I tried to make a serial communication single function module that controls the servo motor on the Petit Robo board in C language
I analyzed the tweets about the new coronavirus posted on Twitter Part 2
(Now) I analyzed the new coronavirus (COVID-19)
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
I tried using PDF data of online medical care based on the spread of the new coronavirus infection
I checked the image of Science University on Twitter with Word2Vec.
Plot the spread of the new coronavirus
Get only image tweets on twitter
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
I tried to predict the behavior of the new coronavirus with the SEIR model.
Folding @ Home on Linux Mint to contribute to the analysis of the new coronavirus
I stumbled on the Hatena Keyword API
Post the subject of Gmail on twitter
Estimate the peak infectivity of the new coronavirus
The epidemic forecast of the new coronavirus was released on the Web at explosive speed
I tried to display the infection condition of coronavirus on the heat map of seaborn
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to take the difference of Config before and after work with pyATS / Genie self-made script
I want to backtest a large number of exchange pairs and strategies at once with Python's backtesting.py
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried moving the image to the specified folder by right-clicking and left-clicking
I tried to get the batting results of Hachinai using image processing
I tried to visualize the age group and rate distribution of Atcoder
How to use the Slack API using Python to delete messages that have passed a certain period of time for a specific user on a specific channel
zoom I tried to quantify the degree of excitement of the story at the meeting
I tried to extract and illustrate the stage of the story using COTOHA
I tried to verify and analyze the acceleration of Python by Cython
I tried to make a skill that Alexa will return as cold
The story of creating Botonyan that returns the contents of Google Docs in response to a specific keyword on Slack