[PYTHON] I investigated the problem that I could not get more than 101 images with google images download
I was able to get a lot of sample images by referring to the following
Google image download did not work, so support it
I'm going to make uncle teacher data, I'm not sure if it's 100, so I want about 1000 images
So I'm debugging ~~
No good
I tried debugging properly for the time being, but this problem is a combination of several problems
- Error due to the difference in the encoding state of html that can be obtained between going to read normally and getting it with chromedriver
- In case of 100 or less, it is solved by dividing the behavior in case of 101 or more.
- When autopiloting chrome with chromedriver, I scroll the screen and read new images one after another, but the "Show more results" button appears on the third lazy load. There is a code to press this button, but it did not work due to a specification change on the google side
- Resolved by specifying the DOM element appropriately
- The HTML of image search stores various information of search results in the form of javascript array, and google images download json decodes it, but the specifications on the google side (I do not know if it is changed), The original image URL when lazy loading is performed is managed on js, and only thumbnails are drawn in HTML, so even if lazy loading is performed to the maximum, only the first 100 cases are analyzed even if only the HTML source is analyzed. I can't take the original image
- I feel like I can do it, but it seems to be quite difficult
So, I came to the conclusion that it seems impossible, so I will put it on hold once. If you want 101 or more, it seems better to hit the other.
Well, let's just know the details