Yesterday, I wrote an article "Scraping community cycle usage history". This article is a continuation of that.
At the time of yesterday, Firefox was started and scraped, but this time yesterday's "[Future](http://qiita.com/nuwaa/items/48c7e1588fb0409659ac#%E4%BB%8A%E5% BE% 8C% E3% 81% AE% E3% 81% 93% E3% 81% A8) ”, I tried to support PhantomJS. When I tried this again, it was completed very easily. That's why the combination of Python3 + BeautifulSoup4 + Selenium + PhantomJS has been confirmed to work in a Windows environment.
docomo-cycle-pjs.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import urllib.request
from bs4 import BeautifulSoup
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import csv
MEMBERID = "(My user ID)"
PASSWORD = "(My password)"
driver = webdriver.PhantomJS()
driver.get("https://tcc.docomo-cycle.jp/cycle/TYO/cs_web_main.php?AreaID=2")
mid = driver.find_element_by_name('MemberID')
mid.send_keys(MEMBERID)
password = driver.find_element_by_name('Password')
password.send_keys(PASSWORD)
password.send_keys(Keys.RETURN)
obj1 = WebDriverWait(driver,5).until(
EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Billing")))
obj1.click()
time.sleep(3)
data = driver.page_source.encode('utf-8')
soup = BeautifulSoup(data, "html.parser")
table = soup.findAll("table",{"class":"rnt_ref_table"})[0]
rows = table.findAll("tr")
csvFile = open("docomo-cycle.csv", 'wt', newline='', encoding='utf-8')
writer = csv.writer(csvFile)
try:
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text().replace('\n',''))
writer.writerow(csvRow)
finally:
csvFile.close()
driver.close()
So, the only difference from yesterday is the description of the web driver. .. .. As of yesterday
driver = webdriver.Firefox()
It was
#### **`driver = webdriver.PhantomJS()`**
```PhantomJS()
It just became.
The point to note is that "PhantomJS is properly installed and the path is in place".
By the way, the operation result of the above script is as follows.
(Compared to yesterday's CSV file, the history has increased by 2!)
#### **`docomo-cycle.csv`**
```csv
1,2016/5/2 07:22,B3-01.Chuo Ward Office B3-01.Chuo City Office,→,2016/5/2 07:35,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
2,2016/5/2 18:29,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building),→,2016/5/2 18:50,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square)
3,2016/5/5 21:32,B3-03.Ginza 6-chome-SQUARE (Kobikicho-dori) B3-03.Ginza 6-chome SQUARE(Kobikicho Dori),→,2016/5/5 21:48,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square)
4,2016/5/6 07:28,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/6 07:41,B2-02.Yanagi-dori (in front of Tokyo Square Garden) B2-02.Yanagi-dori St. (In front of TOKYO SQUARE GARDEN)
5,2016/5/8 05:00,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/8 05:08,H1-02.Toyosu Station H1-02.Toyosu Station
6,2016/5/9 07:25,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/9 07:48,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
7,2016/5/10 08:18,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/10 08:40,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
8,2016/5/10 19:26,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building),→,2016/5/10 19:48,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square)
9,2016/5/11 07:25,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/11 07:45,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
Just changing Firefox to PhantomJS made the operation very refreshing. It would be nice to do things like automatic addition to Google Spreadsheets, storing data in a DB such as MySQL, and so on.
・ "Web scraping with Python" O'Reilly Japan, ISBN978-4-87311-761-4 -For CSV file conversion, I referred to this. (Same as yesterday)
Recommended Posts