Overview

I wrote the code to download all the images of Instagram at once. Stable operation can be expected by manually setting the login and screen scrolling parts.

Demo video

It works like this.

⚠️ Instagram's Terms of Service prohibits the use of automated means to obtain information. Do not build automation tools based on the content of this article.

What is Selenium

Selenium is a group of tools that have specialized functions for test automation of Web applications. For more details, please see the article "Instagram follower pulling out all the big strategy-Mastering Selenium and PyAutoGUI with Python !!-". ..

Preparation

Please prepare an environment where Python 3 can be used.

Install Selenium with pip install selenium.

From ChromeDriver --WebDriver for Chrome Download the Chrome Driver that corresponds to the version of Chrome you are currently using. You can check the Chrome version by searching for chrome: // version / in Chrome. It's convenient!

To update the ChromeDriver to the latest version, run brew cask reinstall chromedriver.

Code flow

[1. Login to Instagram](# 1 Login to Instagram) [2. Get image URL](# 2 Get image url) [3. Download image](# 3 Download image)

I don't care about the flow, I want to move it for the time being In that case, [Whole code](#Whole code) is listed at the bottom, so please!

Click here for the GitHub repository (https://github.com/ekkyu/InstaImgCollector)

1. Login to Instagram

You will be asked to log in manually. Please obtain an Instagram account in advance.


self.login_time = 20

self.options = webdriver.ChromeOptions()
self.options.add_argument('--no-sandbox')

self.driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=self.options)
self.action = ActionChains(self.driver)

def login(self):
    url = "https://www.instagram.com/"
    self.driver.get(url)

    sleep(self.login_time)

By adding ʻoptions.add_argument ('--no-sandbox')`, the login screen will actually open in a separate window.

The time required to log in is specified by login_time = 20. This time it was 20 seconds.

2. Get image URL

Next, get the URL of the image. Manually scroll to the bottom of your profile page.

Due to the specifications of Instagram, the image displayed in response to screen scrolling is acquired.

If it is difficult to grasp the image of the movement, scroll the profile page while displaying the ʻElement panel of the Developer Tools, which is displayed by pressing the (Command ＋ ʻOption ＋ ʻI`) key. please look.

If you are not sure, try pressing the (Command + ʻOption + ʻI) key while keeping the current screen. You should be able to see the HTML source for this page.


self.img_url_list = []

self.window_width = 0
self.window_height = 0

self.scroll_time = 30

def return_img_pattern(self):
    html_source = self.driver.page_source
    pattern = '><a href="/p/(.*?)/"><div class="'
    results = re.findall(pattern, html_source, re.S)
    return results

def fetch_img_url(self, target_username) -> list:
    url = "https://www.instagram.com/{}".format(target_username)
    self.driver.get(url)
    self.driver.maximize_window()

    for i in range(self.scroll_time):
        sleep(1)
        url_list = self.return_img_pattern()
        new_url = [i for i in url_list if i not in self.img_url_list]
        self.img_url_list.extend(new_url)

    return self.img_url_list

The time required for scrolling is specified by scroll_time = 30. This time it was 30 seconds.

Scrolls every second to read the changed HTML code. At this time, only (a part of) the image URL is extracted from the HTML code using the regular expression pattern.

3. Download image

Finally, use the obtained image URL to download the image at once.


self.download_img_size = "l"
#                         l: 640×640px
#                         m: 306×306px
#                         t: 150×150px

def download_img(self, url, save_file_path):
    full_url = "https://www.instagram.com/p/" + str(url) +  "/media/?size=" + self.download_img_size
    r = requests.get(full_url, stream=True)

    if r.status_code == 200:
        with open(save_file_path, 'wb') as f:
            f.write(r.content)

The size of the image to download is

640×640px
306×306px
150×150px

You can choose from. Please specify download_img_size.

Whole code


import re
import json
import requests
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.action_chains  import ActionChains
from selenium.webdriver.common.keys import Keys

class InstaImgCollector:

    def __init__(self):

        self.img_url_list = []
        self.window_width = 0
        self.window_height = 0
        
        self.login_time = 20
        self.scroll_time = 30
        
        self.download_img_size = "l"
#      　　　　　　　    　　　　　   l: 640×640px
#      　　　　　　　　　    　　  　 m: 306×306px
#       　　　　　　　　　　　    　  t: 150×150px
        
        self.options = webdriver.ChromeOptions()
        self.options.add_argument('--no-sandbox')

        self.driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=self.options)
        self.action = ActionChains(self.driver)
    
    def return_img_pattern(self):
        html_source = self.driver.page_source

        pattern = '><a href="/p/(.*?)/"><div class="'
        results = re.findall(pattern, html_source, re.S)
        return results

    def login(self):
        url = "https://www.instagram.com/"
        self.driver.get(url)

        sleep(self.login_time)

    def fetch_img_url(self, target_username) -> list:
        url = "https://www.instagram.com/{}".format(target_username)
        self.driver.get(url)
        self.driver.maximize_window()
        
        for i in range(self.scroll_time):
            sleep(1)
            url_list = self.return_img_pattern()
            new_url = [i for i in url_list if i not in self.img_url_list]
            self.img_url_list.extend(new_url)
        
        return self.img_url_list
        
    def download_img(self, url, save_file_path):
        full_url = "https://www.instagram.com/p/" + str(url) +  "/media/?size=" + self.download_img_size
        r = requests.get(full_url, stream=True)
        
        if r.status_code == 200:
            with open(save_file_path, 'wb') as f:
                f.write(r.content)
    
    def get_post_url_from_id(self, id_):
        self.login()
        
        self.img_url_list = self.fetch_img_url(target_username=id_)

        self.driver.quit()
        return self.img_url_list
    
    def flatten(self, alist):
        return [ flatten for inner in alist for flatten in inner ]


if __name__ == '__main__':
    id_ = "Specify the Instagram ID here"
    iic = InstaImgCollector()
    url_list = iic.get_post_url_from_id(id_)
    
    for url_i in url_list:
        iic.download_img(url_i, "img/{}.png ".format(url_i))

reference

InstaImgCollector What is Selenium Summary of frequently used operation methods of Selenium webdriver How to check the version of Google Chrome [Common to Mac / Windows]

[PYTHON] How to save all Instagram photos at once