A story about a Python beginner trying to get Google search results using the API

Overview

I'm a beginner with Python for 2 weeks, but I want to get Google search results for my seminar research, so this article "[Get Google search results using Custom Search API](https: // qiita) .com / zak_y / items / 42ca0f1ea14f7046108c # 1-api% E3% 82% AD% E3% 83% BC% E3% 81% AE% E5% 8F% 96% E5% BE% 97) ” ..

Although it overlaps with the reference article, I would like to publish how it was made.

environment Windows10 python3.7 Anaconda Navigator

** Target ** Obtained previous research on the seminar research theme "What are the determinants that influence the increase and decrease in the number of foreign visitors to Japan?" → Create a file that lists the titles and URLs of the acquired articles

procedure

  1. Get the API key
  2. Enable Custom Search API
  3. Get Custom Search Engine
  4. Library installation
  5. Get Google search results with API & save as json file 6.5 Save the file obtained in 5 in formatting & tvs format

1. Get the API key

Open the navigation menu of Google Cloud Platform and click "APIs and Services" → "Credentials". API認証.png

Create an API key from "Create Credentials". APIキー.png

I will use the obtained API key later, so copy it and paste it somewhere.

2. Enable Custom Search API

Open the navigation menu of Google Cloud Platform and click "APIs and Services" → "Library". APIメニュー.png

Select "Custom Search API" from "Other" at the bottom of the page to open the details page. Click "Activate". API とサービス - spreadsheet-test - Google Cloud Platform.png

3. Get Custom Search Engine

① Go to the Custom Search Engine page and click "Add".

カスタム検索 - 検索エンジンの編集.png

② ・ Enter the URL of some site under "Site to search" (anything is fine) ・ Language is set to "Japanese" ・ Enter the name of the search engine ・ Click "Create" カスタム検索エンジン設定.png

③ Select the name of the search engine you created earlier from the options under "Edit search engine" and edit it. カスタム検索エンジン設定2.png What is this page -Copy the "search engine ID" and paste it somewhere and save it. ・ Select Japanese for "Language" -Delete the site displayed in "Sites to search" ・ Turn on "Search the entire web" ・ Click "Update"

4. Library installation

Install "Google API Python Client" by referring to "Google API Client Library for Python".

I have created a virtual environment with virtualenv and then installed the library.

5. Get with API & save as json file

Now write the code and run it ... then an error occurs! image.png

** Cause ** image.png Reference article: Causes and workarounds of UnicodeEncodeError (cp932, Shift-JIS encoding) when using Python3 on Windows

** Workaround ** Specify encoding to ʻutf-8` in the argument of Open function.

scrape.py


with open(os.path.join(save_response_dir, 'response_' + today + '.json'), mode='w', encoding='utf-8') as response_file:
        response_file.write(jsonstr)

6. This time get with API & save as json file

With a little tinkering, the final code looks like this:

scrape.py


import os
import datetime
import json

from time import sleep
from googleapiclient.discovery import build
                  
GOOGLE_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CUSTOM_SEARCH_ENGINE_ID = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

DATA_DIR = 'data'


def makeDir(path):
    if not os.path.isdir(path):
        os.mkdir(path)


def getSearchResponse(keyword):
    today = datetime.datetime.today().strftime("%Y%m%d")
    timestamp = datetime.datetime.today().strftime("%Y/%m/%d %H:%M:%S")

    makeDir(DATA_DIR)

    service = build("customsearch", "v1", developerKey=GOOGLE_API_KEY)

    page_limit = 10
    start_index = 1
    response = []
    for n_page in range(0, page_limit):
        try:
            sleep(1)
            response.append(service.cse().list(
                q=keyword,
                cx=CUSTOM_SEARCH_ENGINE_ID,
                lr='lang_ja',
                num=10,
                start=start_index
            ).execute())
            start_index = response[n_page].get("queries").get("nextPage")[
                0].get("startIndex")
        except Exception as e:
            print(e)
            break

    #Save the response in json format
    save_response_dir = os.path.join(DATA_DIR, 'response')
    makeDir(save_response_dir)
    out = {'snapshot_ymd': today, 'snapshot_timestamp': timestamp, 'response': []}
    out['response'] = response
    jsonstr = json.dumps(out, ensure_ascii=False)
    with open(os.path.join(save_response_dir, 'response_' + today + '.json'), mode='w', encoding='utf-8') as response_file:
        response_file.write(jsonstr)


if __name__ == '__main__':

    target_keyword = 'Foreign Visitors in Japan Factor Research'

    getSearchResponse(target_keyword)

When I run it this time, a "response" folder is created under the "data" folder, and a json file is created under that! data_file.png

7.6 Save the file obtained in 6 in formatting & tvs format

The code is below.

prettier.py


import os
import datetime
import json
import pandas as pd

DATA_DIR = 'data'


def makeDir(path):
    if not os.path.isdir(path):
        os.mkdir(path)


def makeSearchResults():
    today = datetime.datetime.today().strftime("%Y%m%d")

    response_filename = os.path.join(
        DATA_DIR, 'response', 'response_' + today + '.json')
    response_file = open(response_filename, 'r', encoding='utf-8')
    response_json = response_file.read()
    response_tmp = json.loads(response_json)
    ymd = response_tmp['snapshot_ymd']
    response = response_tmp['response']
    results = []
    cnt = 0
    for one_res in range(len(response)):
        if 'items' in response[one_res] and len(response[one_res]['items']) > 0:
            for i in range(len(response[one_res]['items'])):
                cnt += 1
                display_link = response[one_res]['items'][i]['displayLink']
                title = response[one_res]['items'][i]['title']
                link = response[one_res]['items'][i]['link']
                snippet = response[one_res]['items'][i]['snippet'].replace(
                    '\n', '')
                results.append({'ymd': ymd, 'no': cnt, 'display_link': display_link,
                                'title': title, 'link': link, 'snippet': snippet})
    save_results_dir = os.path.join(DATA_DIR, 'results')
    makeDir(save_results_dir)
    df_results = pd.DataFrame(results)
    df_results.to_csv(os.path.join(save_results_dir, 'results_' + ymd + '.tsv'), sep='\t',
                      index=False, columns=['ymd', 'no', 'display_link', 'title', 'link', 'snippet'])


if __name__ == '__main__':

    makeSearchResults()


When executed, it was organized in the order of date, number, site URL, title, article URL, and details! result.png

If you open it in Excel, it looks like this ↓ result_excel.png

Impressions

The article I referred to this time ([Get Google search results using Custom Search API](https://qiita.com/zak_y/items/42ca0f1ea14f7046108c#1-api%E3%82%AD%E3%] 83% BC% E3% 81% AE% E5% 8F% 96% E5% BE% 97)) was so nice and easy to understand that even beginners could easily implement it! I have to understand the meaning of the code well, but I'm happy to create a program that can be used in everyday life for the time being: satisfied: However, it seems that there are various restrictions on the Custom Search API if it is a free frame (Google Custom Search JSON API), so I will use it again in the future Sometimes you have to be careful.

Recommended Posts

A story about a Python beginner trying to get Google search results using the API
A story about trying to introduce Linter in the middle of a Python (Flask) project
A story about trying to implement a private variable in Python.
The story of creating a database using the Google Analytics API
A Python script that goes from Google search to saving the Search results page at once
A story about adding a REST API to a daemon made with Python
How to get followers and followers from python using the Mastodon API
A story about trying to run multiple python versions (Mac edition)
Output search results of posts to a file using Mattermost API
Regularly upload files to Google Drive using the Google Drive API in Python
A story about trying to connect to MySQL using Heroku and giving up
A story about a beginner trying hard to set up CentOS 8 (procedure memo)
Convert the cURL API to a Python script (using IBM Cloud object storage)
[Python] Get insight data using Google My Business API
A story about trying a (Golang +) Python monorepo with Bazel
Using the National Diet Library Search API in Python
A little bit from Python using the Jenkins API
A story that got stuck when trying to upgrade the Python version on GCE
A note about hitting the Facebook API with the Python SDK
I tried to search videos using Youtube Data API (beginner)
Try a similar search for Image Search using the Python SDK [Search]
How to get the last (last) value in a list in Python
A story about how to deal with the CORS problem
Tweet Now Playing to Twitter using the Spotify API. [Python]
Output product information to csv using Rakuten product search API [Python]
A story about a python beginner stuck with No module named'http.server'
Run the program without building a Python environment! !! (How to get started with Google Colaboratory)
A story that makes it easy to estimate the living area using Elasticsearch and Python
A story about a Python beginner who was about to be crushed by ModuleNotFoundError: No module named'tweepy'
[Note] A story about trying to override a class method with two underscores in Python 3 series.
Create an application that just searches using the Google Custom Search API with Python 3.3.1 in Bottle
A story about trying to automate a chot when cooking for yourself
[Python] Hit the Google Translation API
[Python3] Google translate google translate without using api
How to get the Python version
A story about using Python's reduce
[Python] I tried to get various information using YouTube Data API!
A story about a person who uses Python addicted to the judgment of an empty JavaScript dictionary
How to send a request to the DMM (FANZA) API with python
Python beginners hit the unofficial API of Google Play Music to play music
About the error I encountered when trying to use Adafruit_DHT from Python on a Raspberry Pi
A story about trying to improve the testing process of a system written in C language for 20 years
A story when a beginner gets stuck trying to build a vim 8.2 + python 3.8.2 + lua plugin environment on Ubuntu 18.04.4 LTS
A story about trying to run JavaScripthon on Windows and giving up.
I tried to get the batting results of Hachinai using image processing
A story about a beginner making a VTuber notification bot from scratch in Python
A story I was addicted to trying to get a video url with tweepy
Try using APSW, a Python library that SQLite can get serious about
How to get a value from a parameter store in lambda (using python)
I tried to get the authentication code of Qiita API with Python.
A story of a deep learning beginner trying to classify guitars on CNN
Sample code to get the Twitter API oauth_token and oauth_token_secret in Python 2.7
I get a UnicodeDecodeError when trying to connect to oracle with python sqlalchemy
How to get a sample report from a hash value using VirusTotal's API
POST photos with Microsoft Bing Image Search API to get Image Insights (Python)
I tried to get the movie information of TMDb API with Python
The story of trying to reconnect the client
Try using the Wunderlist API in Python
Try using the Kraken API in Python
[Beginner] Python web scraping using Google Colaboratory
Tweet using the Twitter API in Python