When a child is in a nursery school, the nursery school staff may take a picture.
There are several ways to share these photos, one of which is the "egao School Photo Service". It's a service of Studio Alice, but I think it's a pretty good system that allows you to select and purchase photos of your child and download them from the web at a later date.
https://egao.photo/store/
However, most parents choose a lot of photos, either or not (my home is over a hundred), but there is no option for this web service, bulk download. If you click one by one, you will gradually lose track of what it is. .. .. .. That's horrible. .. .. ..
I'm sure it will be a similar situation again, so make it as a memorandum of your own.
** This article is based on the egao website as of March 2020, and may not be usable if the specifications of the ega website are changed. ** **
(If possible, please add a batch download if there is a change in the website specifications)
For the time being, I assumed that I would download it according to the following flow.
The preparations for actually proceeding are as follows.
-Install Selenium and Beautiful Soup. (Especially on the PC side, be careful about the version of the web driver etc.) ・ Login ID (Email address) / Password ・ Copy the URL of the list page containing the photos you want to download.
The article referred to (at the end of this article) is detailed about the preset settings, so I will omit it here.
First, I installed the necessary libraries.
python
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
Next, I launched Chrome.Webdriver for automatic operation.
python
driver = webdriver.Chrome()
driver.implicitly_wait(3)
After launching, use the following command to access the relevant website and log in. By the way, if you make the web screen that is automatically displayed smaller, the structure of xml seems to change and there is a possibility of an error. Please note that we are not thinking about how to deal with this area.
python
url = "https://egao.photo/store/" #Web page with login page
user = "hoge@gmail.com" #My E-Describe mail
password = "hogehoge" #Enter the password you have set
driver.get(url)
elem = driver.find_element_by_id("btn-login")#Press the login button on the top page
elem.click()
elem = driver.find_element_by_id("inputEmail")#enter email address
elem.clear()
elem.send_keys(user)
elem = driver.find_element_by_id("inputPassword")#Password input
elem.clear()
elem.send_keys(password)
elem = driver.find_element_by_xpath("//*[@id='login-modal']/div/div/div[2]/form/div/div[3]/div[1]/button")#Press the login button
elem.click()
About the procedure of elem If the procedure is described with an image, it will be in the following form. At the last login, I wish I had an id, but I couldn't find it, so I specified it using Xpath.
Next, specify the web page you want to download in bulk, and use the web driver to transition the page.
python
url_target = "https://egao.photo/store/EventPhoto/Download?Model=hogehogehogehogehoge-1"
driver.get(url_target)
This is the main work to be done with Selenium base once, and then Beautiful Soup comes into play (note that the browser displayed by WebDriver should not be deleted). Beautiful Soup loaded the page currently open by the webdriver and parsed it.
python
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
PhotoId was commonly included as an individual name in the images to be downloaded. First, extract the part containing the photoId and store it in the list. After that, the id (individual id for each image) was further extracted.
python
linklist = []
linklist = soup.find_all('button', attrs={'name': 'photoId'})
linklist_2 = []
for a in linklist:
b = a.attrs['id']
linklist_2.append(b)
It is OK if the contents of linklist_2 are as follows.
['Download_XYXYXYXYXYYYY', 'Download_YYYYYYYYYYYYY', 'Download_XXXXXXXXXXXYY', 'Download_XXXXXXXXXXXXY']
Finally, I went back to Selenium and downloaded the images for each id.
python
for a in linklist_2:
elem = driver.find_element_by_id(a)
elem.click()
With the method so far, you can download all at once up to the maximum amount displayed on the web page, so you can collect it by making the same page transition for another page and executing the same command again. It was.
The next time I need to download a lot, I'm thinking of automating a little more, including the redundant parts. Anyway, I'm glad that this made it easier when I needed to download the same large number of images again.
Most of the necessary things were written in the following two. Thanks.
・ [Selenium] Log in and write data to csv [Beautiful Soup] ・ Download images of Irasutoya at once with Python scraping
Recommended Posts