I wrote the code to download all the images of Instagram at once. Stable operation can be expected by manually setting the login and screen scrolling parts.
It works like this.
⚠️ Instagram's Terms of Service prohibits the use of automated means to obtain information. Do not build automation tools based on the content of this article.
Selenium is a group of tools that have specialized functions for test automation of Web applications. For more details, please see the article "Instagram follower pulling out all the big strategy-Mastering Selenium and PyAutoGUI with Python !!-". ..
Please prepare an environment where Python 3 can be used.
Install Selenium with pip install selenium
.
From ChromeDriver --WebDriver for Chrome
Download the Chrome Driver that corresponds to the version of Chrome you are currently using.
You can check the Chrome version by searching for chrome: // version /
in Chrome. It's convenient!
To update the ChromeDriver to the latest version, run brew cask reinstall chromedriver
.
[1. Login to Instagram](# 1 Login to Instagram) [2. Get image URL](# 2 Get image url) [3. Download image](# 3 Download image)
I don't care about the flow, I want to move it for the time being In that case, [Whole code](#Whole code) is listed at the bottom, so please!
Click here for the GitHub repository (https://github.com/ekkyu/InstaImgCollector)
You will be asked to log in manually. Please obtain an Instagram account in advance.
self.login_time = 20
self.options = webdriver.ChromeOptions()
self.options.add_argument('--no-sandbox')
self.driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=self.options)
self.action = ActionChains(self.driver)
def login(self):
url = "https://www.instagram.com/"
self.driver.get(url)
sleep(self.login_time)
By adding ʻoptions.add_argument ('--no-sandbox')`, the login screen will actually open in a separate window.
The time required to log in is specified by login_time = 20
. This time it was 20 seconds.
Next, get the URL of the image. Manually scroll to the bottom of your profile page.
Due to the specifications of Instagram, the image displayed in response to screen scrolling is acquired.
If it is difficult to grasp the image of the movement, scroll the profile page while displaying the ʻElement panel of the
Developer Tools, which is displayed by pressing the (
Command + ʻOption
+ ʻI`) key. please look.
If you are not sure, try pressing the (Command
+ ʻOption + ʻI
) key while keeping the current screen.
You should be able to see the HTML source for this page.
self.img_url_list = []
self.window_width = 0
self.window_height = 0
self.scroll_time = 30
def return_img_pattern(self):
html_source = self.driver.page_source
pattern = '><a href="/p/(.*?)/"><div class="'
results = re.findall(pattern, html_source, re.S)
return results
def fetch_img_url(self, target_username) -> list:
url = "https://www.instagram.com/{}".format(target_username)
self.driver.get(url)
self.driver.maximize_window()
for i in range(self.scroll_time):
sleep(1)
url_list = self.return_img_pattern()
new_url = [i for i in url_list if i not in self.img_url_list]
self.img_url_list.extend(new_url)
return self.img_url_list
The time required for scrolling is specified by scroll_time = 30
. This time it was 30 seconds.
Scrolls every second to read the changed HTML code. At this time, only (a part of) the image URL is extracted from the HTML code using the regular expression pattern.
Finally, use the obtained image URL to download the image at once.
self.download_img_size = "l"
# l: 640×640px
# m: 306×306px
# t: 150×150px
def download_img(self, url, save_file_path):
full_url = "https://www.instagram.com/p/" + str(url) + "/media/?size=" + self.download_img_size
r = requests.get(full_url, stream=True)
if r.status_code == 200:
with open(save_file_path, 'wb') as f:
f.write(r.content)
The size of the image to download is
You can choose from. Please specify download_img_size
.
import re
import json
import requests
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
class InstaImgCollector:
def __init__(self):
self.img_url_list = []
self.window_width = 0
self.window_height = 0
self.login_time = 20
self.scroll_time = 30
self.download_img_size = "l"
# l: 640×640px
# m: 306×306px
# t: 150×150px
self.options = webdriver.ChromeOptions()
self.options.add_argument('--no-sandbox')
self.driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=self.options)
self.action = ActionChains(self.driver)
def return_img_pattern(self):
html_source = self.driver.page_source
pattern = '><a href="/p/(.*?)/"><div class="'
results = re.findall(pattern, html_source, re.S)
return results
def login(self):
url = "https://www.instagram.com/"
self.driver.get(url)
sleep(self.login_time)
def fetch_img_url(self, target_username) -> list:
url = "https://www.instagram.com/{}".format(target_username)
self.driver.get(url)
self.driver.maximize_window()
for i in range(self.scroll_time):
sleep(1)
url_list = self.return_img_pattern()
new_url = [i for i in url_list if i not in self.img_url_list]
self.img_url_list.extend(new_url)
return self.img_url_list
def download_img(self, url, save_file_path):
full_url = "https://www.instagram.com/p/" + str(url) + "/media/?size=" + self.download_img_size
r = requests.get(full_url, stream=True)
if r.status_code == 200:
with open(save_file_path, 'wb') as f:
f.write(r.content)
def get_post_url_from_id(self, id_):
self.login()
self.img_url_list = self.fetch_img_url(target_username=id_)
self.driver.quit()
return self.img_url_list
def flatten(self, alist):
return [ flatten for inner in alist for flatten in inner ]
if __name__ == '__main__':
id_ = "Specify the Instagram ID here"
iic = InstaImgCollector()
url_list = iic.get_post_url_from_id(id_)
for url_i in url_list:
iic.download_img(url_i, "img/{}.png ".format(url_i))
InstaImgCollector What is Selenium Summary of frequently used operation methods of Selenium webdriver How to check the version of Google Chrome [Common to Mac / Windows]
Recommended Posts