[PYTHON] I tried to get the information of the .aspx site that is paging using Selenium IDE as non-programming as possible.

Since there are many places where you can get information using html parameters on a normal site, use the Selenium IDE which is an extension of Firefox. You can easily get information just by using it. However, aspx sites have information inside the source, and it is not possible to obtain information using only the html parameters. So, I decided to do non-plumming for (!) Aspx sites by playing with the source a little.

Execution environment

Mac OS Sierra is assumed as the execution environment.

Preparation

Execute the following command from the terminal.

pip install selenium bs4
brew install geckodriver

Generating the underlying source

Here we use the Selenium IDE. Record the repetitive behavior expected on the site you want to get with Selenium IDE. Then the command will be displayed at the table.

If you can confirm the command Select the language you want to export from here and export, and the source at that time will be displayed. After saving with python2 / unittest / webdriver, when you open the source,

	def test_<Saved title>(self):

I think that there is a part, but that is where the actual processing is performed.

Creating an iterative process

There are some iterative processes in python, but basically I want to repeat pages 1-30, so I use range. If the part you want to repeat is as follows

-        driver.find_element_by_link_text("page 1").click()
+        for i range(1, 30):
+            driver.find_element_by_link_text("page 1").click()

Put for i range (1, 30): in front of the place where you want to repeat the statement in the form of, and indent only the place you want to repeat.

After that, find a number that deserves the number of pages, such as page 1, and change it topage "+ str (i) +".

Get page information

I use the html formatting tool to get the page information, but this time I will use BeautifulSoup4.

To the top of the sauce

import unittest, time, re
+from bs4 import BeautifulSoup

And at the timing when you want to release the source

    data = driver.page_source.encode('utf-8')
    html = BeautifulSoup(data)
    print(html.select("<selector>"))

Please paste. Right-click on the target html tag in the window that appears in "Element Verification" that appears when you right-click on the .

Choose this CSS path and replace it with .

Run

You are now ready to run. Open the terminal


python <file name>.py > test.html

If test.html is created, it is successful.

Summary

It wasn't non-programming at all, so I'd be happy if anyone who knows a more non-plugging method could tell me.