Nogizaka46 I saved the image of the blog by scraping with Python. I scraped the first page of Manatsu Akimoto.
scraping.py
import requests
import urllib.request
import os
from bs4 import BeautifulSoup
def scraping():
#Member URL
member_name = "manatsu.akimoto"
url = "http://blog.nogizaka46.com/" + member_name + "/"
#Create folder
if not os.path.isdir(member_name): # ”member_If there is no "name" folder
print("Create folder")
os.mkdir(member_name)
#For counting the number of saved sheets
cnt = 0
#BeautifulSoup object generation
headers = {"User-Agent": "Mozilla/5.0"}
soup = BeautifulSoup(requests.get(
url, headers=headers).content, 'html.parser')
#Find the html where the image is located
for entry in soup.find_all("div", class_="entrybody"): #Get all entry bodies
for img in entry.find_all("img"): #Get all img
cnt += 1
urllib.request.urlretrieve(
img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")
print("the image" + str(cnt) + "I saved a sheet.")
if __name__ == '__main__':
scraping()
Since the member's name is used as the URL, I put the name of the member I want to get in member_name.
member_name = "manatsu.akimoto"
url = "http://blog.nogizaka46.com/" + member_name + "/"
There is an easy-to-understand explanation on the following site. Reference site: https://python.civic-apps.com/beautifulsoup4-selector/
Looking at the html that makes up the blog, There is a body in the div tag of the class name "entrybody" There is an image in the img tag in it, so save it in a folder as soon as you find it.
for entry in soup.find_all("div", class_="entrybody"):#Get all entry bodies
for img in entry.find_all("img"):#Get all img
cnt += 1
urllib.request.urlretrieve(img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")
Create folder
I have saved 22 images.
Recommended Posts