[Python] How to save images on the Web at once with Beautiful Soup

Introduction

Here, we will introduce "How to save images on the Web at once" by web scraping.

: warning: Attention: warning: If it is protected by copyright or if it is OK in terms of copyright but scraping is prohibited by the terms of use, there is a possibility of claiming damages, so make sure you understand the copyright law and terms of use on the Web. Let's scrape.

table of contents

  1. [How to do web scraping](# 1-How to do web scraping)
  2. [Actually save the image](# 2-Actually save the image)
  3. [Extraction flow](# 3-Extraction flow)
  4. [Summary](# 4-Summary)
  5. [Bonus](# 5-Bonus)
  6. [Reference](# 6-Reference)

1. How to do web scraping

Web scraping can be done in various languages such as "Ruby", "PHP", and "Javascript", but this time I will introduce the method using Python's "Beautiful Soup".

2. Actually save the image

① Install beautifulsoup4 with pip

pip install beautifulsoup4

② Decide on a site for web scraping

③ Get the URL of each image link page from the list page

url = "https://www.irasutoya.com/search/label/%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9"
#Prepare a list to store the URL of the image page
link_list = []
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response, "html.parser")
#Get all image link tags
image_list = soup.select('div.boxmeta.clearfix > h2 > a')
#Extract image links one by one
for image_link in image_list:
    link_url = image_link.attrs['href']
    link_list.append(link_url)

④ Get all the tags of the image file

for page_url in link_list:
    page_html = urllib.request.urlopen(page_url)
    page_soup = BeautifulSoup(page_html, "html.parser")
    #Get all tags for image files
    img_list = page_soup.select('div.separator > a > img')

⑤ Take out the img tags one by one and get the URL of the image file.

for img in img_list:
    #Get the URL of the image file
    img_url = (img.attrs['src'])
    file_name = re.search(".*/(.*png|.*jpg)$", img_url)
    save_path = output_folder.joinpath(file_name.group(1))

⑥ Download the data from the URL of the image file

try:
   #Get data from image file URL
   image = requests.get(img_url)
   #Save the data in the save destination file path
   open(save_path, 'wb').write(image.content)
   #Show saved file name
   print(save_path)
except ValueError:
   print("ValueError!")

That's all for the procedure.

↓ ↓ Execution result ↓ ↓ result1.png

3. Extraction flow

I thought it was a little difficult to imagine steps ③ to ⑤, so I created a rough extraction flow. process.png

Also, the source of this time is also posted on Github, so please refer to it from the following. https://github.com/miyazakikna/SaveLocalImageWebScraping.git

4. Summary

Here, I explained how to save images in bulk using Beatiful Soup of Python. I got the image of Irasutoya this time, but I think that you can download the image in the same way on other sites, so please use it.

5. Bonus

Click here for how to change the file name at once after downloading the image ↓ ↓ [[Work efficiency] How to change file names in Python] (https://qiita.com/miyazakikna/items/b9c6d6d83ebcd529afd7)

6. Reference

Let's scrape images with PythonImage collection by web scraping

Recommended Posts

[Python] How to save images on the Web at once with Beautiful Soup
Save images on the web to Drive with Python (Colab)
Download Wikipedia flag images all at once [Python] [Beautiful Soup]
Convert memo at once with Python 2to3
Download files on the web with Python
How to save all Instagram photos at once
Strategy on how to monetize with Python Java
Introduction to Python with Atom (on the way)
[Python] How to save the installed package and install it in a new environment at once Mac environment
Think about how to program Python on the iPad
[Introduction to Python] How to iterate with the range function?
[Python] How to specify the download location with youtube-dl
[Python] How to rewrite the table style with python-pptx [python-pptx]
How to enjoy Python on Android !! Programming on the go !!
I tried to simulate how the infection spreads with Python
How to install Python2.7 python3.5 with pyenv (on RHEL5 CentOS5) (2016 Nov)
[Hyperledger Iroha] Notes on how to use the Python SDK
[Ev3dev] How to display bmp image on LCD with python
[Part.2] Crawling with Python! Click the web page to move!
How to scrape at speed per second with Python Selenium
How to get into the python development environment with Vagrant
[Introduction to Python] How to get data with the listdir function
How to deal with the phenomenon that Python (Jupyter notebook) executed on WSL becomes Aborted
How to know the number of GPUs from python ~ Notes on using multiprocessing with pytorch ~
Python: How to use async with
Cropping images at once [python] [Pillow]
How to collect images in Python
Scraping with Python and Beautiful Soup
How to get the Python version
How to get started with Python
[Python] How to import the library
How to use FTP with Python
How to calculate date with python
How is the progress? Let's get on with the boom ?? in Python
[Introduction to Python] How to split a character string with the split function
How to use python put in pyenv on macOS with PyCall
[Python] Explains how to use the format function with an example
How to update the python version of Cloud Shell on GCP
How to send a request to the DMM (FANZA) API with python
The fastest way to get camera images regularly with python opencv
I was surprised at how to save objects with python, which is lean and very energy-saving.
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
The 16th offline real-time how to write problem was solved with Python
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
How to run the practice code of the book "Creating a profitable AI with Python" on Google Colaboratory
How to read pydoc on python interpreter
[Python3] Understand the basics of Beautiful Soup
How to crop the lower right part of the image with Python OpenCV
[Kivy] How to install Kivy on Windows [Python]
[Python] Explains how to use the range function with a concrete example
The 16th offline real-time how to write reference problem to solve with Python
How to get the date and time difference in seconds with python
Add 95% confidence intervals on both sides to the diagram with Python / Matplotlib
Use python on Raspberry Pi 3 to light the LED with switch control!
How to work with BigQuery in Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
Function to save images by date [python3]
How to erase Python 2.x on Mac.
Settings when using Python 3 requests and Beautiful Soup with crostini on Chromebook
How to do portmanteau test with python
How to display python Japanese with lolipop