[PYTHON] How to save all Instagram photos at once

Overview

I wrote the code to download all the images of Instagram at once. Stable operation can be expected by manually setting the login and screen scrolling parts.

Demo video

It works like this.

instaimgdownload.gif

⚠️ Instagram's Terms of Service prohibits the use of automated means to obtain information. Do not build automation tools based on the content of this article.

What is Selenium

Selenium is a group of tools that have specialized functions for test automation of Web applications. For more details, please see the article "Instagram follower pulling out all the big strategy-Mastering Selenium and PyAutoGUI with Python !!-". ..

Preparation

Please prepare an environment where Python 3 can be used.

Install Selenium with pip install selenium.

From ChromeDriver --WebDriver for Chrome Download the Chrome Driver that corresponds to the version of Chrome you are currently using. You can check the Chrome version by searching for chrome: // version / in Chrome. It's convenient!

To update the ChromeDriver to the latest version, run brew cask reinstall chromedriver.

Code flow

[1. Login to Instagram](# 1 Login to Instagram) [2. Get image URL](# 2 Get image url) [3. Download image](# 3 Download image)

I don't care about the flow, I want to move it for the time being In that case, [Whole code](#Whole code) is listed at the bottom, so please!

Click here for the GitHub repository (https://github.com/ekkyu/InstaImgCollector)

1. Login to Instagram

You will be asked to log in manually. Please obtain an Instagram account in advance.


self.login_time = 20

self.options = webdriver.ChromeOptions()
self.options.add_argument('--no-sandbox')

self.driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=self.options)
self.action = ActionChains(self.driver)

def login(self):
    url = "https://www.instagram.com/"
    self.driver.get(url)

    sleep(self.login_time)

By adding ʻoptions.add_argument ('--no-sandbox')`, the login screen will actually open in a separate window.

The time required to log in is specified by login_time = 20. This time it was 20 seconds.

2. Get image URL

Next, get the URL of the image. Manually scroll to the bottom of your profile page.

Due to the specifications of Instagram, the image displayed in response to screen scrolling is acquired.

If it is difficult to grasp the image of the movement, scroll the profile page while displaying the ʻElement panel of the Developer Tools, which is displayed by pressing the (Command + ʻOption + ʻI`) key. please look.

If you are not sure, try pressing the (Command + ʻOption + ʻI) key while keeping the current screen. You should be able to see the HTML source for this page.


self.img_url_list = []

self.window_width = 0
self.window_height = 0

self.scroll_time = 30

def return_img_pattern(self):
    html_source = self.driver.page_source
    pattern = '><a href="/p/(.*?)/"><div class="'
    results = re.findall(pattern, html_source, re.S)
    return results

def fetch_img_url(self, target_username) -> list:
    url = "https://www.instagram.com/{}".format(target_username)
    self.driver.get(url)
    self.driver.maximize_window()

    for i in range(self.scroll_time):
        sleep(1)
        url_list = self.return_img_pattern()
        new_url = [i for i in url_list if i not in self.img_url_list]
        self.img_url_list.extend(new_url)

    return self.img_url_list

The time required for scrolling is specified by scroll_time = 30. This time it was 30 seconds.

Scrolls every second to read the changed HTML code. At this time, only (a part of) the image URL is extracted from the HTML code using the regular expression pattern.

3. Download image

Finally, use the obtained image URL to download the image at once.


self.download_img_size = "l"
#                         l: 640×640px
#                         m: 306×306px
#                         t: 150×150px

def download_img(self, url, save_file_path):
    full_url = "https://www.instagram.com/p/" + str(url) +  "/media/?size=" + self.download_img_size
    r = requests.get(full_url, stream=True)

    if r.status_code == 200:
        with open(save_file_path, 'wb') as f:
            f.write(r.content)

The size of the image to download is

You can choose from. Please specify download_img_size.

Whole code


import re
import json
import requests
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.action_chains  import ActionChains
from selenium.webdriver.common.keys import Keys

class InstaImgCollector:

    def __init__(self):

        self.img_url_list = []
        self.window_width = 0
        self.window_height = 0
        
        self.login_time = 20
        self.scroll_time = 30
        
        self.download_img_size = "l"
#                         l: 640×640px
#                         m: 306×306px
#                         t: 150×150px
        
        self.options = webdriver.ChromeOptions()
        self.options.add_argument('--no-sandbox')

        self.driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=self.options)
        self.action = ActionChains(self.driver)
    
    def return_img_pattern(self):
        html_source = self.driver.page_source

        pattern = '><a href="/p/(.*?)/"><div class="'
        results = re.findall(pattern, html_source, re.S)
        return results

    def login(self):
        url = "https://www.instagram.com/"
        self.driver.get(url)

        sleep(self.login_time)

    def fetch_img_url(self, target_username) -> list:
        url = "https://www.instagram.com/{}".format(target_username)
        self.driver.get(url)
        self.driver.maximize_window()
        
        for i in range(self.scroll_time):
            sleep(1)
            url_list = self.return_img_pattern()
            new_url = [i for i in url_list if i not in self.img_url_list]
            self.img_url_list.extend(new_url)
        
        return self.img_url_list
        
    def download_img(self, url, save_file_path):
        full_url = "https://www.instagram.com/p/" + str(url) +  "/media/?size=" + self.download_img_size
        r = requests.get(full_url, stream=True)
        
        if r.status_code == 200:
            with open(save_file_path, 'wb') as f:
                f.write(r.content)
    
    def get_post_url_from_id(self, id_):
        self.login()
        
        self.img_url_list = self.fetch_img_url(target_username=id_)

        self.driver.quit()
        return self.img_url_list
    
    def flatten(self, alist):
        return [ flatten for inner in alist for flatten in inner ]


if __name__ == '__main__':
    id_ = "Specify the Instagram ID here"
    iic = InstaImgCollector()
    url_list = iic.get_post_url_from_id(id_)
    
    for url_i in url_list:
        iic.download_img(url_i, "img/{}.png ".format(url_i))

reference

InstaImgCollector What is Selenium Summary of frequently used operation methods of Selenium webdriver How to check the version of Google Chrome [Common to Mac / Windows]

Recommended Posts

How to save all Instagram photos at once
[Python] How to save images on the Web at once with Beautiful Soup
Replace all at once with sed
How to split and save a DataFrame
Convert memo at once with Python 2to3
Send newsletters all at once with Gmail
[Python] How to save the installed package and install it in a new environment at once Mac environment
Set Expire to Redis key at once
How to put a lot of pipelines together and put them away at once
How to create large files at high speed
How to hold a competition at Coda Lab
Command to automatically update pip library at once
Tensorflow, Tensorflow After all, which one (How to read Tensorflow)
Upgrade all at once including dependencies with pip
Beginners try to convert Word files to PDF at once
How to run setUp only once in python unittest
How to save a table scraped by python to csv
[TF] How to save and load Tensorflow learning parameters
I made a tool to get the answer links of OpenAI Gym all at once