[PYTHON] Scraping the usage history of the community cycle PhantomJS version

Introduction

Yesterday, I wrote an article "Scraping community cycle usage history". This article is a continuation of that.

What I tried

At the time of yesterday, Firefox was started and scraped, but this time yesterday's "[Future](http://qiita.com/nuwaa/items/48c7e1588fb0409659ac#%E4%BB%8A%E5% BE% 8C% E3% 81% AE% E3% 81% 93% E3% 81% A8) ”, I tried to support PhantomJS. When I tried this again, it was completed very easily. That's why the combination of Python3 + BeautifulSoup4 + Selenium + PhantomJS has been confirmed to work in a Windows environment.

docomo-cycle-pjs.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import urllib.request
from bs4 import BeautifulSoup
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import csv

MEMBERID = "(My user ID)"
PASSWORD = "(My password)"

driver = webdriver.PhantomJS()
driver.get("https://tcc.docomo-cycle.jp/cycle/TYO/cs_web_main.php?AreaID=2")

mid = driver.find_element_by_name('MemberID')
mid.send_keys(MEMBERID)
password = driver.find_element_by_name('Password')
password.send_keys(PASSWORD)
password.send_keys(Keys.RETURN)

obj1 = WebDriverWait(driver,5).until(
    EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Billing")))
obj1.click()

time.sleep(3)

data = driver.page_source.encode('utf-8')
soup = BeautifulSoup(data, "html.parser")

table = soup.findAll("table",{"class":"rnt_ref_table"})[0]
rows = table.findAll("tr")

csvFile = open("docomo-cycle.csv", 'wt', newline='', encoding='utf-8')
writer = csv.writer(csvFile)
try:
  for row in rows:
    csvRow = []
    for cell in row.findAll(['td', 'th']):
      csvRow.append(cell.get_text().replace('\n',''))
    writer.writerow(csvRow)
finally:
  csvFile.close()
  
  
driver.close()

So, the only difference from yesterday is the description of the web driver. .. .. As of yesterday

driver = webdriver.Firefox()


 It was

#### **`driver = webdriver.PhantomJS()`**
```PhantomJS()

 It just became.
 The point to note is that "PhantomJS is properly installed and the path is in place".

 By the way, the operation result of the above script is as follows.
 (Compared to yesterday's CSV file, the history has increased by 2!)


#### **`docomo-cycle.csv`**
```csv

1,2016/5/2 07:22,B3-01.Chuo Ward Office B3-01.Chuo City Office,→,2016/5/2 07:35,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
2,2016/5/2 18:29,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building),→,2016/5/2 18:50,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square)
3,2016/5/5 21:32,B3-03.Ginza 6-chome-SQUARE (Kobikicho-dori) B3-03.Ginza 6-chome SQUARE(Kobikicho Dori),→,2016/5/5 21:48,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square)
4,2016/5/6 07:28,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/6 07:41,B2-02.Yanagi-dori (in front of Tokyo Square Garden) B2-02.Yanagi-dori St. (In front of TOKYO SQUARE GARDEN)
5,2016/5/8 05:00,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/8 05:08,H1-02.Toyosu Station H1-02.Toyosu Station
6,2016/5/9 07:25,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/9 07:48,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
7,2016/5/10 08:18,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/10 08:40,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)
8,2016/5/10 19:26,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building),→,2016/5/10 19:48,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square)
9,2016/5/11 07:25,B4-03.Sakura no Sanpomchi (in front of Harumi Triton Square) B4-03.Sakura no sanpomichi(In front of Harumi Triton Square),→,2016/5/11 07:45,A3-02.Casa Nova Shop (Kaede Building) A3-02.CASA NOUVA SHOP(Kaede Building)

Future

Just changing Firefox to PhantomJS made the operation very refreshing. It would be nice to do things like automatic addition to Google Spreadsheets, storing data in a DB such as MySQL, and so on.

reference

"Web scraping with Python" O'Reilly Japan, ISBN978-4-87311-761-4 -For CSV file conversion, I referred to this. (Same as yesterday)

Recommended Posts

Scraping the usage history of the community cycle PhantomJS version
Scraping the usage history of the community cycle
Align the version of chromedriver_binary
Scraping the result of "Schedule-kun"
Test the version of the argparse module
Raise the version of pyenv itself
pyenv-change the python version of virtualenv
Change the Python version of Homebrew
How to check the version of Django
About the virtual environment of python version 3.7
[Python] Try pydash of the Python version of lodash
The usage of TensorBoard has changed slightly
Migemo version of the: find command,: mfind
Try the python version of emacs-org parser orgparse
The story of making the Mel Icon Generator version2
Use the latest version of PyCharm on Ubuntu
A note about the python version of python virtualenv
Try the free version of Progate [Python I]
Visualized the usage status of the sink in the company
Organize the super-basic usage of Autotools and pkg-config
Scraping the winning data of Numbers using Docker