[PYTHON] Scraping google search (image)

I referred to the following site. https://qiita.com/taedookim/items/63759e79426514c8a729

Clone from github

Use google-images-download. The head family seems to have stopped updating, so from the fork destination.

git clone https://github.com/Joeclinton1/google-images-download.git

Copy the following file to the location you want to use, edit a little

Download folder \ google-images-download \ google_images_download \ google_images_download.py

google_images_download.py line 935


NG: json_file = json.load(open(arguments['config_file']))
OK: json_file = json.load(open(arguments['config_file'], encoding='utf-8'))

Now you can search in Japanese.

Create the following python file in the copy destination folder

image_scraper.py


import os

from google_images_download import googleimagesdownload   #importing the library

#Change the current directory to the folder containing the executable file
os.chdir(os.path.dirname(os.path.abspath(__file__)))
print('Changed current working directory')

response = googleimagesdownload()   #class instantiation
paths = response.download({"config_file": "config.json"})   #passing the arguments to the function
print(paths)   #printing absolute paths of the downloaded images

Copy the version of Chromedriver you are using to the same folder

Download from below https://chromedriver.chromium.org/downloads

Rename and edit sample_config.json to config.json

Once this is done, it will be available. It seems that you can set multiple search queries.

Renaming is unnecessary, but sample will be inappropriate.

config.json


{
    "Records": [
        {
            "keywords": "apple",
            "limit": 5,
            "color": "green",
            "print_urls": true
        },
        {
            "keywords": "universe",
            "limit": 15,
            "size": "large",
            "print_urls": true
        }
    ]
}

Run image_scraper.py

From VS Code or from the console.

How to use config.json

Details (English) https://google-images-download.readthedocs.io/en/latest/arguments.html

Basic structure

Observe the following structure. If "limit" is 100 or more, "chrome driver" is required.

{
    "Records": [
        {
            "keywords": "hoge",
            "limit": 777,
            "format": "png",
            "print_urls": true,
            "chromedriver": "chromedriver.exe"
        }
    ]
}

Valid options, choices, etc.

Settings Key value
Maximum number of images "limit" Integer, such as 200
Image format "format" "jpg", "gif", "png", "bmp", "svg", "webp", "ico", "raw"
Related images "related_images" true, false
size "size" "large", "medium", "icon", ">400300", ">640480", ">800600", ">1024768", ">2MP", ">4MP", ">6MP", ">8MP", ">10MP", ">12MP", ">15MP", ">20MP", ">40MP", ">70MP"
Aspect ratio "aspect_ratio" "tall", "square", "wide", "panoramic"
color "color" "red", "orange", "yellow", "green", "teal", "blue", "purple", "pink", "white", "gray", "black", "brown"
color "color_type" "full-color", "black-and-white", "transparent"
type "type" "face", "photo", "clip-art", "line-drawing", "animated"
time "time" "past-24-hours", "past-7-days", "past-month", "past-year"
period "time_range" ‘{“time_min”:”MM/DD/YYYY”,”time_max”:”MM/DD/YYYY”}’
license "usage_rights" "labeled-for-reuse-with-modifications","labeled-for-reuse", "labeled-for-noncommercial-reuse-with-modification", "labeled-for-nocommercial-reuse"
Console output "print_urls" true, false
chromedriver "chromedriver" "chromedriver.exe"

Recommended Posts

Scraping google search (image)
Save dog images from Google image search
Image collection using Google Custom Search API
[Python] Download original images from Google Image Search
Hinatazaka's blog image scraping
Get Google Image Search images in original size
Get the image of "Suzu Hirose" by Google image search.
Scraping immediately from google images!
Image collection by web scraping
[Python scraping] I tried google search top10 using Beautifulsoup & selenium
Scraping Google News search results in Python (2) Use Beautiful Soup
Get and visualize google search trends
Environmentally friendly scraping using image processing
Snippets (scraping) registered in Google Colaboratory
Let's do image scraping with Python
Scraping 1
Download the top n Google image searches
How to search Google Drive with Google Colaboratory
Image segment using Oxford_iiit_pet on Google Colab
Automatically save images of your favorite characters from Google Image Search with Python
[Python selenium] After scraping Google search results, output title and URL in csv