A Python script that goes from Google search to saving the Search results page at once

Purpose

Even if you do a Google search by typing keywords as you can think of, it tends to be inefficient, such as duplicate searches if you do not record what you searched for and how.

However, it is troublesome to take notes one by one, so I think that many people save Google's search result page in the form of Page Source or Web archive and consider it again slowly.

However, that is also a little annoying.

Therefore, we have summarized the process from entering search keywords to saving HTML in one step.

function

-If you give (1) search keywords and (2) the number of results displayed per page, the search result page will be saved as an HTML file in the working directory (CWD). --The links to the second and subsequent pages and the "Next" link displayed at the bottom of the page are invalid (because they are relative paths). ――For this point, (2) increase the number of results displayed per page (maximum 100 results / page).

How to use

You will be prompted twice, so

--Enter the query (search keyword) at the first prompt, --If you want to add search options (site: go.jp, filetype: pdf, etc.), enter them together at this stage. * c.f. * Improve the accuracy of web search --At the following prompt, enter the number of results to be displayed per search result page.

About scripts

――It is uselessly a class due to various reasons, but please understand ... ――This time, when you get html_text, you immediately drop it in an HTML file, but of course you can use it as it is without dropping it in a file. --The setting of my_headers is not mandatory, but the returned HTML will be slightly different depending on the presence or absence of it. This makes sense in the "utilization" scene above.

Search result page structure

――When you look at the HTML, the link destination of the result appears four times per case, changing its appearance and shape. ――There is only the headquarter of structuring (?), So it's really well done ...

google_fetcher.py


import os
from urllib.parse import quote_plus, urlunsplit
import requests
import re
PROJECT_ROOT_PATH = '.'


class GoogleResultsPage:
    '''Query text, Results number per page -> search results response'''
    def __init__(self, query, rslts_num):
        self.__qry = query
        self.__num = rslts_num

        query_string  = 'q='+quote_plus(self.__qry)+'&num='+str(self.__num)
        search_string = urlunsplit(
            ('https', 'www.google.com', '/search', query_string, ''))
        self.__sstr   = search_string

    def page_fetcher(self):
        '''Fetch the result page and return as a text response'''
        my_headers = {'user-agent':
                      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)\
                      AppleWebKit/537.36 (KHTML, like Gecko)\
                      Chrome/84.0.4147.105 Safari/537.36'}
        response = requests.get(self.__sstr,
                                headers=my_headers, timeout=(3.05, 27))
        return response.text


################################
# Output to a file.
def html_to_file(html_text):
    '''Text response content to a HTML file.'''
    output_file_name = re.sub(r'[\/.:;*?"<>|  ]', r'_', query)+'.html'
    output_file_path = os.path.join(PROJECT_ROOT_PATH, output_file_name)
    with open(output_file_path, 'w') as f:
        f.write(html_text)

    print('Done! ', end='')
    print('File path:', output_file_path)

if __name__ == '__main__':
    query     = input('Query? >>  ')
    rslts_num = input('Results per page (upto 100)? >>  ')

    html_text  = GoogleResultsPage(query, rslts_num).page_fetcher()
    html_to_file(html_text)

Recommended Posts

A Python script that goes from Google search to saving the Search results page at once
A story about a Python beginner trying to get Google search results using the API
A Python script that allows you to check the status of the server from your browser
A script that returns 0, 1 attached to the first Python prime number
"Python Kit" that calls a Python script from Swift
A python script that draws a band diagram from the VASP output file EIGENVAL
Python script to create a JSON file from a CSV file
Create a python script to check if the link at the specified URL is valid 2
A python script that gets the number of jobs for a specified condition from indeed.com
Create a python script to check if the link at the specified URL is valid
A Python script that saves a clipboard (GTK) image to a file.
How to run a Python program from within a shell script
Creating a Python script that supports the e-Stat API (ver.2)
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
A python script that converts Oracle Database data to csv
A Python script that compares the contents of two directories
Extract the value closest to a value from a Python list element
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
A story that struggled to handle the Python package of PocketSphinx
Create a shell script to run the python file multiple times
From a book that programmers can learn (Python): Find the mode
I wrote a script to extract a web page link in Python
[python] A note that started to understand the behavior of matplotlib.pyplot
[Python] A program that rotates the contents of the list to the left
Pass the selected item in Tablacus Explorer from JScript to python and rename it all at once
[Python] How to save the installed package and install it in a new environment at once Mac environment
A Python script that automatically collects typical images using bing image search
[Python] Explore the characteristics of the titles of the top sites in Google search results
Extract lines that match the conditions from a text file with python
From a book that programmers can learn (Python): Conditional search (maximum value)
[To Twitter gentlemen] I wrote a script to convert .jpg-large to .jpg at once.
From a book that makes the programmer's way of thinking interesting (Python)
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
Convert memo at once with Python 2to3
Execute a script from Jupyter to process
How to update Google Sheets from Python
Send a message from Python to Slack
Search the maze with the python A * algorithm
Run the Python interpreter in a script
Use python's pixivpy to download all the works of a specific user from pixiv at once (including moving)
Use Django from a local Python script
[Python3] Code that can be used when you want to change the extension of an image at once
[Python] A program that rounds the score
How to run a Maya Python script
A solution to the problem that the Python version in Conda cannot be changed
I took a quick look at the fractions package that handles Python built-in fractions.
The story of IPv6 address that I want to keep at a minimum
Create a plugin that allows you to search Sublime Text 3 tabs in Python
A Python script that reads a SQL file, executes BigQuery and saves the csv
Around the authentication of PyDrive2, a package that operates Google Drive with Python
Python version (PHP to Python) that deletes the subsequent character string from the specified character string (extension)
[Python] How to save images on the Web at once with Beautiful Soup
[Python] Solution to the problem that elements are linked when copying a list
I wrote a script to revive the gulp watch that will die soon
Python script to get a list of input examples for the AtCoder contest
Convert the cURL API to a Python script (using IBM Cloud object storage)
[Python] I tried to get the type name as a string from the type function
I made a script to record the active window using win32gui of Python
[Python] Download original images from Google Image Search
Edit Excel from Python to create a PivotTable
Csv output from Google search with [Python]! 【Easy】