[PYTHON] Scraping 100 Fortnite images

I scraped 100 Fortnite images from Yahoo.

・ Mac ・ Python3

(1) Environment construction, directory structure

Create a directory fortnite on your desktop. Create an images folder (for saving images) and scraping.py in the directory.

fortnite
├scraping.py
└images

Build a virtual environment in the directory.

python3 -m venv .
sorce bin/activate

Install required packages and modules

pip install beautifulsoup4
pip install requests
pip install lxml

(2) Describe scraping.py

Fortnite's image scraping uses Yahoo's image search results. https://search.yahoo.co.jp/image/search?p=%E3%83%95%E3%82%A9%E3%83%BC%E3%83%88%E3%83%8A%E3%82%A4%E3%83%88&ei=UTF-8&b=1 There are 10 images per page, and it can be confirmed that there are more than 100 images including the following pages. Scrape from here and store in the images folder.

.py:scraping.py


from bs4 import BeautifulSoup
import lxml
import requests
import os
import time


def main():
    #20 images per page, variables for scraping the next page
    page_key=0

    #Variables for numbering saved images
    num_m = 0

    for i in range(6):
        URL = "https://search.yahoo.co.jp/image/search?p=%E3%83%95%E3%82%A9%E3%83%BC%E3%83%88%E3%83%8A%E3%82%A4%E3%83%88&ei=UTF-8&b={}".format(page_key + 1)
        res = requests.get(URL)
        res.encoding = res.apparent_encoding
        html_doc = res.text
        soup = BeautifulSoup(html_doc,"lxml")

        list = []
        _list = soup.find_all("div",class_="gridmodule")
        for i in _list:
            i2 = i.find_all('img')
            for i3 in i2:
                i4 = i3.get('src')
                list.append(i4)


        for i in list:
            i2 = requests.get(i)
            #Save with absolute path
            with open(os.path.dirname(os.path.abspath(__file__)) + '/images' + '/{}'.format(num_m)+'.jpeg','wb')as f:
                f.write(i2.content)
            num_m += 1
            #Stop the save process when the 101st image is reached (stop for statement)
            if num_m == 101:
                break

        #When the for statement of the inner save process is stopped, the process of stopping the outer for statement as well
        else:
            continue
        break


        #Open 1 second interval to prevent server load
        time.sleep(1)

        page_key+=20

if __name__ == '__main__':
    main()

Supplementary explanation

-As a result of searching for a likely location of the image URL using "verification" of Google Chrome, it was confirmed that the class of the div tag is in the gridmodule part. From there, scrape the img tag part. -Get the value of src attribute of img tag with get ('src'). -Although the src attribute of the acquired img tag is url, it is str type, so get the response object that stores the response information in requests. Response objects include text, encoding, status_code, and content. content is needed to get the response body in binary format. (Reference) How to use Requests (Python Library) -Specify the absolute path in the file and write in wb mode (reference) About Python and os operations ・ After saving 100 sheets with a for statement, cancel the inner for statement and the outer for statement. Python for loop break (break condition)

When you run it, you can see that you have saved the images in the images folder.

Recommended Posts

Scraping 100 Fortnite images
Automatically download images with scraping
Save images with web scraping
Scraping 1
GAN: DCGAN Part1 --Scraping Web images
Nogizaka46 Get blog images by scraping
Collect images by scraping. Make more videos!
Start scraping
[Scraping] Python scraping
Scraping sample
web scraping
Image scraping ②-Get images from bing, yahoo, Flickr