Scraping with selenium in Python

Introduction

In a university lecture, I had the task of acquiring the invention name according to the search item from the patent information platform and analyzing it by natural language processing. Other students copied all the HTML source of the page and used the grep function of Excel or an editor to extract only what they needed. I used python to automate it, get only what I needed, and even automate the process of creating a text file. This time, the code at that time is also used as a memorandum of my own, but I will publish it.

table of contents

1. Introduction [2. What I wanted to do](#What I wanted to do) [3. Prerequisite knowledge](# Prerequisite knowledge) 4. Preparation [5. Actual code](#actual code) [6. Summary](# Summary) [7. Reference document](# Reference document)

What I wanted to do

Get the necessary items from Patent Information Platform and create a text file.

Prerequisite knowledge

I wonder if the knowledge required to read this code is as follows. --Basic Python grammar --Minimum HTML knowledge

Preparation

If you do not have the library required to use the program, please install it.

>> pip install requests
>> pip install selenium

You will also need a chromedriver, so if you don't have one, install it from here and use the same directory as your program. Please put it in.

Actual code

Click here for Github

In implementing this time, in order to get all the information of the target page, it was necessary to scroll to the bottom of the page and load it, so I used the scroll function to scroll to the top. In the main function, the ID at the bottom of the page is acquired, and the invention names for that number are acquired. Please refer to here for how to get the value from HTML. After getting all the text, it is saved in a text file.

main.py


"""
A program that fetches patent invention names from the Japan Platform for Patent Information
"""
# coding:utf-8
import os
import time

import requests
from selenium import webdriver


def scroll(driver):
    """
Scroll down the page.
    """
    html01 = driver.page_source
    while 1:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
        html02 = driver.page_source
        if html01 != html02:
            html01 = html02
        else:
            break
            
            
def main():
    """
Hit any search item on the patent information platform
Acquire the invention name of the patent
    """
    path = os.getcwd()  #Get the current directory
    #Set driver
    driver = webdriver.Chrome(path + '\\chromedriver')
    #Access to the Patent Information Platform
    driver.get('https://www.j-platpat.inpit.go.jp/')

    #Setting the word to search
    print('What are you search?')
    serach_word = input()
    #Setting the file name to create
    print('please type a file name')
    file_name = input()

    time.sleep(2)
    driver.find_element_by_name('s01_srchCondtn_txtSimpleSearch').click()
    driver.find_element_by_name('s01_srchCondtn_txtSimpleSearch').send_keys(serach_word)
    driver.find_element_by_name('s01_srchBtn_btnSearch').click()
    time.sleep(5)

    #Page scroll
    scroll(driver)

    #Get the maximum No of the thing that matches the search result
    id_str = driver.find_elements_by_id('patentUtltyIntnlSimpleBibLst_tableView_numberArea')[-1].text
    id_num = int(id_str)

    words = []
    for i in range(id_num):
        word = driver.find_element_by_id('patentUtltyIntnlSimpleBibLst_tableView_invenName{}'.format(i)).text
        words.append(word)
        print(word)
    print(words)

    #Create a text file
    with open(file_name, 'w') as f:
        f.write('\n'.join(words))


if __name__ == "__main__":
    main()

Summary

This time, I introduced how to get data from a web page and save it in a text file using Selenium in Python. I hope it will be helpful for those who want to see it using Selenium from now on.

Reference document

Recommended Posts

Scraping with selenium in Python
Scraping with Selenium in Python
Scraping with Selenium in Python (Basic)
Scraping with Selenium [Python]
Scraping with Selenium + Python Part 1
Scraping with chromedriver in python
Scraping with Tor in Python
Scraping with Selenium + Python Part 2
Scraping with selenium
Scraping with selenium ~ 2 ~
Scraping with Python
Scraping with Python
Scraping with Selenium
Scraping with Python, Selenium and Chromedriver
I was addicted to scraping with Selenium (+ Python) in 2020
Successful scraping with Selenium
Scraping with Python (preparation)
Try scraping with Python.
Scraping with Python + PhantomJS
ScreenShot with Selenium (Python)
Python web scraping selenium
Practice web scraping with Python and Selenium
Scraping with Python + PyQuery
Scraping RSS with Python
Log in to Yahoo Business with Selenium Python
Achieve scraping with Python & CSS selector in 1 minute
I tried scraping with Python
Web scraping with python + JupyterLab
[Python] Scraping in AWS Lambda
Working with LibreOffice in Python
Web scraping notes in python3
Festive scraping with Python, scrapy
Debugging with pdb in Python
Python: Working with Firefox with selenium
Working with sounds in Python
Tweet with image in Python
Combined with permutations in Python
Web scraping using Selenium (Python)
Scraping weather forecast with python
[Python + Selenium] Tips for scraping
I tried scraping with python
Web scraping beginner with python
I-town page scraping with selenium
[Scraping] Python scraping
Try scraping with Python + Beautiful Soup
Testing with random numbers in Python
Scraping with Node, Ruby and Python
GOTO in Python with Sublime Text 3
Working with LibreOffice in Python: import
Web scraping with Python ① (Scraping prior knowledge)
CSS parsing with cssutils in Python
Web scraping with Python First step
I tried web scraping with python.
Scraping with Python and Beautiful Soup
Numer0n with items made in Python
Open UTF-8 with BOM in Python
Scraping with Beautiful Soup in 10 minutes
Use rospy with virtualenv in Python3
Write selenium test code in python
Let's do image scraping with Python
Use Python in pyenv with NeoVim