Since there are many places where you can get information using html parameters on a normal site, use the Selenium IDE which is an extension of Firefox. You can easily get information just by using it. However, aspx sites have information inside the source, and it is not possible to obtain information using only the html parameters. So, I decided to do non-plumming for (!) Aspx sites by playing with the source a little.
Mac OS Sierra is assumed as the execution environment.
Execute the following command from the terminal.
pip install selenium bs4
brew install geckodriver
Here we use the Selenium IDE. Record the repetitive behavior expected on the site you want to get with Selenium IDE. Then the command will be displayed at the table.
If you can confirm the command Select the language you want to export from here and export, and the source at that time will be displayed. After saving with python2 / unittest / webdriver, when you open the source,
def test_<Saved title>(self):
I think that there is a part, but that is where the actual processing is performed.
There are some iterative processes in python, but basically I want to repeat pages 1-30, so I use range. If the part you want to repeat is as follows
- driver.find_element_by_link_text("page 1").click()
+ for i range(1, 30):
+ driver.find_element_by_link_text("page 1").click()
Put for i range (1, 30):
in front of the place where you want to repeat the statement in the form of, and indent only the place you want to repeat.
After that, find a number that deserves the number of pages, such as page 1
, and change it topage "+ str (i) +"
.
I use the html formatting tool to get the page information, but this time I will use BeautifulSoup4.
To the top of the sauce
import unittest, time, re
+from bs4 import BeautifulSoup
And at the timing when you want to release the source
data = driver.page_source.encode('utf-8')
html = BeautifulSoup(data)
print(html.select("<selector>"))
Please paste.
Right-click on the target html tag in the window that appears in "Element Verification" that appears when you right-click on the
Choose this CSS path and replace it with
You are now ready to run. Open the terminal
python <file name>.py > test.html
If test.html is created, it is successful.
It wasn't non-programming at all, so I'd be happy if anyone who knows a more non-plugging method could tell me.
Recommended Posts