[PYTHON] I tried to get the information of the .aspx site that is paging using Selenium IDE as non-programming as possible.

Since there are many places where you can get information using html parameters on a normal site, use the Selenium IDE which is an extension of Firefox. You can easily get information just by using it. However, aspx sites have information inside the source, and it is not possible to obtain information using only the html parameters. So, I decided to do non-plumming for (!) Aspx sites by playing with the source a little.

Execution environment

Mac OS Sierra is assumed as the execution environment.

Preparation

Execute the following command from the terminal.

  1. pip install selenium bs4
  2. brew install geckodriver

Generating the underlying source

Here we use the Selenium IDE. Record the repetitive behavior expected on the site you want to get with Selenium IDE. Then the command will be displayed at the table.

If you can confirm the command Select the language you want to export from here and export, and the source at that time will be displayed. After saving with python2 / unittest / webdriver, when you open the source,

	def test_<Saved title>(self):

I think that there is a part, but that is where the actual processing is performed.

Creating an iterative process

There are some iterative processes in python, but basically I want to repeat pages 1-30, so I use range. If the part you want to repeat is as follows

-        driver.find_element_by_link_text("page 1").click()
+        for i range(1, 30):
+            driver.find_element_by_link_text("page 1").click()

Put for i range (1, 30): in front of the place where you want to repeat the statement in the form of, and indent only the place you want to repeat.

After that, find a number that deserves the number of pages, such as page 1, and change it topage "+ str (i) +".

Get page information

I use the html formatting tool to get the page information, but this time I will use BeautifulSoup4.

To the top of the sauce

import unittest, time, re
+from bs4 import BeautifulSoup

And at the timing when you want to release the source

    data = driver.page_source.encode('utf-8')
    html = BeautifulSoup(data)
    print(html.select("<selector>"))

Please paste. Right-click on the target html tag in the window that appears in "Element Verification" that appears when you right-click on the . css-selector.png

Choose this CSS path and replace it with .

Run

You are now ready to run. Open the terminal


python <file name>.py > test.html

If test.html is created, it is successful.

Summary

It wasn't non-programming at all, so I'd be happy if anyone who knows a more non-plugging method could tell me.

Recommended Posts

I tried to get the information of the .aspx site that is paging using Selenium IDE as non-programming as possible.
I tried to get the index of the list using the enumerate function
I tried to make a site that makes it easy to see the update information of Azure
I tried to get the batting results of Hachinai using image processing
I tried to get the movie information of TMDb API with Python
I tried to visualize the spacha information of VTuber
I want to get custom data attributes of html as elements using Python Selenium
I tried to get Web information using "Requests" and "lxml"
I want to get the operation information of yahoo route
I tried to get various information from the codeforces API
I tried to get data from AS / 400 quickly using pypyodbc
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to get a database of horse racing using Pandas
I looked at the meta information of BigQuery & tried using it
[Python] I tried to get various information using YouTube Data API!
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
PhytoMine-I tried to get the genetic information of plants with Python
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to get the authentication code of Qiita API with Python.
I tried to extract and illustrate the stage of the story using COTOHA
I tried to get the RSS of the top song of the iTunes store automatically
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
I want to get the path of the directory where the running file is stored.
[Python] I tried to judge the member image of the idol group using Keras
I tried using the Python library "pykakasi" that can convert kanji to romaji.
I tried to explain what a Python generator is for as easily as possible.
I tried to explain multiple regression analysis as easily as possible using concrete examples.
Python programming: I tried to get (crawling) news articles using Selenium and BeautifulSoup4.
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
[Python] I tried to get the type name as a string from the type function
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to predict the victory or defeat of the Premier League using the Qore SDK
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to score the syntax that was too humorous and humorous using the COTOHA API.
I tried to extract the text in the image file using Tesseract of the OCR engine
I want to take a screenshot of the site on Docker using any font
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to get an AMI using AWS Lambda
I tried to approximate the sin function using chainer
I tried using the API of the salmon data project
I tried to erase the negative part of Meros
I tried scraping the advertisement of the pirated cartoon site
[Python] I tried to get Json of squid ring 2
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to classify the voices of voice actors
I tried to compress the image using machine learning
I tried to summarize the string operations of Python
[Pokemon Sword Shield] I tried to visualize the judgment basis of deep learning using the three family classification as an example.
I tried using the frequently used seaborn method with as few arguments as possible [for beginners]
[1 hour challenge] I tried to make a fortune-telling site that is too suitable with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python