Introduction

Nogizaka46 I saved the image of the blog by scraping with Python. I scraped the first page of Manatsu Akimoto.

code

`scraping.py`


import requests
import urllib.request
import os
from bs4 import BeautifulSoup


def scraping():
    #Member URL
    member_name = "manatsu.akimoto"
    url = "http://blog.nogizaka46.com/" + member_name + "/"

    #Create folder
    if not os.path.isdir(member_name):  # ”member_If there is no "name" folder
        print("Create folder")
        os.mkdir(member_name)

    #For counting the number of saved sheets
    cnt = 0

    #BeautifulSoup object generation
    headers = {"User-Agent": "Mozilla/5.0"}
    soup = BeautifulSoup(requests.get(
        url, headers=headers).content, 'html.parser')

    #Find the html where the image is located
    for entry in soup.find_all("div", class_="entrybody"):  #Get all entry bodies
        for img in entry.find_all("img"):  #Get all img
            cnt += 1
            urllib.request.urlretrieve(
                img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")
    print("the image" + str(cnt) + "I saved a sheet.")


if __name__ == '__main__':
    scraping()

Member URL

スクリーンショット (1).png スクリーンショット (2).png Since the member's name is used as the URL, I put the name of the member I want to get in member_name.

member_name = "manatsu.akimoto"
url = "http://blog.nogizaka46.com/" + member_name + "/"

BeautifulSoup object generation

There is an easy-to-understand explanation on the following site. Reference site: https://python.civic-apps.com/beautifulsoup4-selector/

Find the html where the image is located

Looking at the html that makes up the blog, スクリーンショット (7).png There is a body in the div tag of the class name "entrybody" スクリーンショット (8).png There is an image in the img tag in it, so save it in a folder as soon as you find it.

for entry in soup.find_all("div", class_="entrybody"):#Get all entry bodies
    for img in entry.find_all("img"):#Get all img
        cnt += 1
        urllib.request.urlretrieve(img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")

Execution result

Page at the time of execution

screencapture-blog-nogizaka46-manatsu-akimoto-2020-02-19-12_42_35.jpg

Created folder

スクリーンショット (12).png

Command line display

Create folder
I have saved 22 images.

[PYTHON] Nogizaka46 Get blog images by scraping