[Selenium/Python] Try to display the court case pdf at once

Self-introduction

Nice to meet you. I decided to write an article for the first time this time. Currently a second year master's student in information technology, I plan to join an IT company in April of this year. Until now, we haven't made external calls, but we will continue to make them little by little, so thank you!

background

The first memorable post is a super introduction to "scraping". I have always been interested in natural language processing, and I was wondering if I could deal with "law" as its target. Then I found a certain research report and maybe I could do it too! For the time being, let's pull the sentence from the Court Page! I thought.

environment

・ Python 3.7.7 ・ Windows 10 Pro ・ PyCharm 2019.3.3 (IDE)

contents

Note This time, the content is simply to display the pdf in the page in the browser. We will continue to make improvements to bring the legal field closer to us.

law.py


from selenium import webdriver

driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver')
driver.get('https://www.courts.go.jp/app/hanrei_jp/search1')
search_bar = driver.find_element_by_name("filter[text1]")
search_bar.send_keys("GPS")
search_bar.submit()

#Extract elements with Xpath and tr with format[]Change the value of
#Click on it to view the page
for i in range(1,11):
    x_path = "//*[@id='main-contents']/div[2]/div/div[3]/div[5]/table/tbody/tr[{0}]/td[2]/a".format(i)
    driver.find_element_by_xpath(x_path).click()

** ➀ Access the court case search page. ** ** ** ② Extract the html tag in the search window and search with the keyword "GPS". ** **

Learn

I used selenium for the first time this time, but it took about an hour with just this content. (Is it inefficient?) Especially in the last part, the html structure is too complicated and I was worried about how to extract the tags of the PDF file.

From now on, ➀ Change pages to display all PDFs ② Download PDF ➂ The user can freely set the search word If such functions are possible, it seems that natural language processing can be applied, so I would like to continue taking on the challenge.

Thank you for reading my first post!

reference

・ Https://stackoverrun.com/ja/q/11884507 ・ Https://ai-inter1.com/python-selenium/ ・ Https://www.seleniumqref.com/api/python/element_get/Python_find_element_by_xpath.html

Recommended Posts

[Selenium/Python] Try to display the court case pdf at once
Beginners try to convert Word files to PDF at once
[Cloudian # 9] Try to display the metadata of the object in Python (boto3)
[Cloudian # 2] Try to display the object storage bucket in Python (boto3)
Try to introduce the theme to Pelican
Cython to try in the shortest
The fastest way to try EfficientNet
The easiest way to try PyQtGraph
Try to react only the carbon at the end of the chain with SMARTS
How to display the progress bar (tqdm)
Convert memo at once with Python 2to3
Try to face the integration by parts
Python amateurs try to summarize the list ①
I want to display the progress bar
Set Expire to Redis key at once
Try to display the Fibonacci sequence in various languages in the name of algorithm practice
[Python] How to save images on the Web at once with Beautiful Soup
Try to display the railway data of national land numerical information in 3D