Background (Situation & Task)

When a child is in a nursery school, the nursery school staff may take a picture.

There are several ways to share these photos, one of which is the "egao School Photo Service". It's a service of Studio Alice, but I think it's a pretty good system that allows you to select and purchase photos of your child and download them from the web at a later date.

ホームページ___egao_スクールフォトサービス.jpg https://egao.photo/store/

However, most parents choose a lot of photos, either or not (my home is over a hundred), but there is no option for this web service, bulk download. If you click one by one, you will gradually lose track of what it is. .. .. .. That's horrible. .. .. ..

I'm sure it will be a similar situation again, so make it as a memorandum of your own.

** This article is based on the egao website as of March 2020, and may not be usable if the specifications of the ega website are changed. ** **

(If possible, please add a batch download if there is a change in the website specifications)

What I tried to do (Action)

For the time being, I assumed that I would download it according to the following flow.

Access the site
Log in
Transition to the download page
Download the displayed images (purchased photos) in a batch

Prepare in advance

The preparations for actually proceeding are as follows.

-Install Selenium and Beautiful Soup. (Especially on the PC side, be careful about the version of the web driver etc.) ・ Login ID (Email address) / Password ・ Copy the URL of the list page containing the photos you want to download.

The article referred to (at the end of this article) is detailed about the preset settings, so I will omit it here.

Actual procedure (Result)

First, I installed the necessary libraries.

`python`


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

Next, I launched Chrome.Webdriver for automatic operation.

`python`


driver = webdriver.Chrome()
driver.implicitly_wait(3)

After launching, use the following command to access the relevant website and log in. By the way, if you make the web screen that is automatically displayed smaller, the structure of xml seems to change and there is a possibility of an error. Please note that we are not thinking about how to deal with this area.

`python`


url = "https://egao.photo/store/" #Web page with login page
user = "[email protected]" #My E-Describe mail
password = "hogehoge" #Enter the password you have set
driver.get(url)

elem = driver.find_element_by_id("btn-login")#Press the login button on the top page
elem.click()
elem = driver.find_element_by_id("inputEmail")#enter email address
elem.clear()
elem.send_keys(user)
elem = driver.find_element_by_id("inputPassword")#Password input
elem.clear()
elem.send_keys(password)
elem = driver.find_element_by_xpath("//*[@id='login-modal']/div/div/div[2]/form/div/div[3]/div[1]/button")#Press the login button
elem.click()

About the procedure of elem If the procedure is described with an image, it will be in the following form. At the last login, I wish I had an id, but I couldn't find it, so I specified it using Xpath.

ホームページ___egao_スクールフォトサービス.jpg

Next, specify the web page you want to download in bulk, and use the web driver to transition the page.

`python`


url_target = "https://egao.photo/store/EventPhoto/Download?Model=hogehogehogehogehoge-1"
driver.get(url_target)

This is the main work to be done with Selenium base once, and then Beautiful Soup comes into play (note that the browser displayed by WebDriver should not be deleted). Beautiful Soup loaded the page currently open by the webdriver and parsed it.

`python`


page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')

PhotoId was commonly included as an individual name in the images to be downloaded. First, extract the part containing the photoId and store it in the list. After that, the id (individual id for each image) was further extracted.

`python`


linklist = []
linklist = soup.find_all('button', attrs={'name': 'photoId'})

linklist_2 = []
for a in linklist:
    b = a.attrs['id']
    linklist_2.append(b)

It is OK if the contents of linklist_2 are as follows.

['Download_XYXYXYXYXYYYY', 'Download_YYYYYYYYYYYYY', 'Download_XXXXXXXXXXXYY', 'Download_XXXXXXXXXXXXY']

Finally, I went back to Selenium and downloaded the images for each id.

`python`


for a in linklist_2:
    elem = driver.find_element_by_id(a)
    elem.click()

With the method so far, you can download all at once up to the maximum amount displayed on the web page, so you can collect it by making the same page transition for another page and executing the same command again. It was.

Impressions & what I want to do in the future

The next time I need to download a lot, I'm thinking of automating a little more, including the redundant parts. Anyway, I'm glad that this made it easier when I needed to download the same large number of images again.

Referenced web page

Most of the necessary things were written in the following two. Thanks.

・ [Selenium] Log in and write data to csv [Beautiful Soup] ・ Download images of Irasutoya at once with Python scraping

How to download all photos of egao school photo service with python base

Background (Situation & Task)

What I tried to do (Action)

Prepare in advance

Actual procedure (Result)

python

python

python

python

python

python

python

Impressions & what I want to do in the future

Referenced web page

`python`

`python`

`python`

`python`

`python`

`python`

`python`