Csv output from Google search with [Python]! 【Easy】

Original article

background

When I looked up something, I wanted to pull the summary, title, and URL from Google search because the Google summary tells me about what I want to look up.

Purpose

Automate Google search, get search results, convert to CSV and output. By automating, you can reduce the time spent searching.

Requirement definition

  1. Automatic browser operation
  2. Get the title, summary, and URL from the search results
  3. Output the obtained result to CSV

environment

Environment construction method

The program created this time is based on the 81 version of Chrome. Therefore, please set the Chrome version to 81 before running the program. ↓ This is easy to understand. How to check the version of Google Chrome

pip install selenium
pip install chromedriver_binary

Code actually written

import csv
import time  #Required to use sleep
from selenium import webdriver  #Automatically operate the web browser (python-m pip install selenium)
import chromedriver_binary  #Code to pass the path

def ranking(driver):
    i = 1  #Define loop number and page number

    title_list = []  #Prepare an empty list to store the title
    link_list = []  #Prepare an empty list to store the URL
    summary_list = []
    RelatedKeywords = []

    #Loop until the current page exceeds the specified maximum analysis page
    while i <= i_max:
        #Title and link are class="r"Is in
        class_group = driver.find_elements_by_class_name('r')
        class_group1 = driver.find_elements_by_class_name('s')
        class_group2 = driver.find_elements_by_class_name('nVcaUb')
        #For loop that extracts titles and links and adds them to the list
        for elem in class_group:
            title_list.append(elem.find_element_by_class_name('LC20lb').text)  #title(class="LC20lb")
            link_list.append(elem.find_element_by_tag_name('a').get_attribute('href'))  #Link(href attribute of a tag)

        for elem in class_group1:
            summary_list.append(elem.find_element_by_class_name('st').text)  #Link(href attribute of a tag)

        for elem in class_group2:
            RelatedKeywords.append(elem.text)  #Link(href attribute of a tag)

        #There is only one "Next", but I dare to search multiple by elements. An empty list means the last page.
        if driver.find_elements_by_id('pnnext') == []:
            i = i_max + 1
        else:
            #The URL of the next page is id="pnnext"Href attribute of
            next_page = driver.find_element_by_id('pnnext').get_attribute('href')
            driver.get(next_page)  #Move to the next page
            i = i + 1  #update i
            time.sleep(3)  #Wait 3 seconds

    return title_list, link_list, summary_list, RelatedKeywords  #Specify a list of titles and links as a return value



driver = webdriver.Chrome()  #Prepare chrome

#Open sample HTML
driver.get('https://www.google.com/')  #Open google
i_max = 5  #Define up to how many pages to analyze
search = driver.find_element_by_name('q')  #Search box in HTML(name='q')To specify
search.send_keys('Scraping automation"Python"')  #Send search word
search.submit()  #Perform a search
time.sleep(1.5)  # 1.Wait 5 seconds

#Run the ranking function to get the title and URL list
title, link, summary, RelatedKeywords = ranking(driver)


csv_list = [["Ranking", "title", "wrap up", "Link", "Related keywords"]]

for i in range(len(title)):
    add_list=[i+1,title[i],summary[i],link[i]]
    csv_list.append(add_list)

#Save title list to csv

with open('Search_word.csv','w',encoding="utf-8_sig") as f:
    writecsv = csv.writer(f, lineterminator='\n')
    writecsv.writerows(csv_list)

driver.quit()


Impressions

In total, I was able to create it in about 4 hours. I am satisfied because I can now automatically operate the browser using selenium and also get the title. You can use it when writing a blog!

What I can do

  1. You can now read documents without dislike
  2. Google search automation
  3. Output to CSV

Task

  1. Review how to operate the list
  2. Make this possible by linking Google Colab and spreadsheets
  3. I want to be able to search from the spreadsheet as shown in the last two reference URLs.
  4. If the HTML Class name changes, it will not be possible again, so I want to be able to flexibly acquire data even if the Class name changes.

References

  1. I tried to extract the Google search title and URL list with Python
  2. Locating Elements
  3. A story about having a hard time opening a file other than CP932 (Shift-JIS) encoded on Windows
  4. I want to be able to search from the spreadsheet as shown in the last two reference URLs.
  5. [Python] What to do if you can't scrape Google search results [with commentary]
  6. [[Python]] csv output from Google search![Easy](Original article)](https://acfoapon.hatenablog.com/entry/2020/04/16/120000?_ga=2.113898116.2051319045.1587005144-1011829840.1582693178)

Next implementation

Prepare a CSV list and create a program to read search words from it. Combine with the program created this time to make it easy to search with multiple search words. It also enables data to be stored using spreadsheets.

Recommended Posts

Csv output from Google search with [Python]! 【Easy】
Output to csv file with Python
Python> Output numbers from 1 to 100, 501 to 600> For csv
[Python] Download original images from Google Image Search
Sequential search with Python
Read JSON with Python and output as CSV
Generate an insert statement from CSV with Python.
Binary search with python
Binary search with Python3
Csv tinkering with python
Make JSON into CSV with Python from Splunk
Automatically save images of your favorite characters from Google Image Search with Python
[Python selenium] After scraping Google search results, output title and URL in csv
Easy way to scrape with python using Google Colab
[Python-pptx] Output PowerPoint font information to csv with python
Remove headings from multiple format CSV files with python
Read csv with python pandas
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Try Python output with Haxe 3.2
Study Python with Google Colaboratory
Easy folder synchronization with Python
Full bit search with Python
Write to csv with Python
Easy Python compilation with NUITKA-Utilities
Easy HTTP server with Python
Access Google Drive with Python
Search engine work with python
Search twitter tweets with python
With skype, notify with skype from python!
Download csv file with python
Streamline web search with python
Notes on importing data from MySQL or CSV with Python
Easy with just Python! Output Graphviz figures in draw.io format!
Get data from analytics API with Google API Client for python
Output product information to csv using Rakuten product search API [Python]
Python: Extract file information from shared drive with Google Drive API
Call C from Python with DragonFFI
[Python] Easy parallel processing with Joblib
Make apache log csv with python
Using Rstan from Python with PypeR
[Python] Write to csv file with Python
Install Python from source with Ansible
Create folders from '01' to '12' with python
Learn search with Python # 2bit search, permutation search
Input / output with Python (Python learning memo ⑤)
Forcibly use Google Translate from python
Easy Python + OpenCV programming with Canopy
[Note] Hello world output with python
Unit test log output with python
Easy email sending with haste python3
Run Aprili from Python with Orange
Bayesian optimization very easy with Python
Handle Excel CSV files with Python
Call python from nim with Nimpy
Easy data visualization with Python seaborn.
Reading and writing CSV with Python
Easy parallel execution with python subprocess
Read fbx from python with cinema4d
Easy modeling with Blender and Python
Use Google Analytics API from Python
Extract bigquery dataset and table list with python and output as CSV