Get a list of purchased DMM eBooks with Python + Selenium

I wanted to see a list of books I bought from before, but it didn't work at all even after trial and error.

"Python Crawling & Scraping [Augmented and Revised Edition] -Practical Development Guide for Data Collection and Analysis"

I read, studied, and tried again [^ 1].

[^ 1]: I really read it, but the reference link was actually more useful. This book is useless because it also touches on laws and regulations.


Actual code and execution result

from selenium import webdriver
import time
import dmm_user_pass

#Easy way to prevent leakage of user name and password
# (Dmm in the same directory_user_pass.Never publish py)
USER = dmm_user_pass.USER
PASS = dmm_user_pass.PASS

#Get a Phantom JS driver
browser = webdriver.PhantomJS()
browser.implicitly_wait(5)

#Access the login page(You should be taken to the purchased page immediately after logging in)
url_login = "https://accounts.dmm.com/service/login/password"
browser.get(url_login)
print("I visited the login page")

#Enter characters in the text box
e = browser.find_element_by_id("login_id")
e.clear()
e.send_keys(USER)
e = browser.find_element_by_id("password")
e.clear()
e.send_keys(PASS)
#Submit form
browser.find_element_by_xpath("//form[@name='loginForm']//input[@type='submit']").click()
print("I entered the information and pressed the login button")
time.sleep(10)

#View purchased pages for DMM e-books
#If page1 becomes a page that does not exist in loop, url will be automatically
# https://book.dmm.com/library/?age_limit=all&expired=Become 1(The page you want to transition to and the current url do not match)
url_purchased = 'https://book.dmm.com/library/?age_limit=all&expired=1&sort=old&page='
for i in range(1, 100):
    page_i = url_purchased + str(i)
    browser.get(page_i)
    time.sleep(5)
    if page_i != browser.current_url:
        break #Processing to exit according to the comment above
    #List purchased titles
    links = browser.find_elements_by_css_selector(
        ".m-boxListBookProductBlock__main__info__ttl > a")
    for a in links:
        title = a.text
        print("-", title)
/usr/local/lib/python3.7/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
I visited the login page
I entered the information and pressed the login button
-Mob Psycho 100
-tell me! Please Tell Me!
(Omitted below)

The warning will be described later.

I was able to get a list of books purchased with DMM e-books in chronological order.

I think it's possible to get a link or bring in the amount of money with a little more ingenuity, but I won't do it this time because it takes time and is troublesome. [^ 2]

[^ 2]: DMM e-books with multiple volumes are grouped together, so those with multiple volumes are linked to the page of the series, but those without multiple volumes are directly on the work details page. Since it is linked, it is necessary to make a conditional branch there when acquiring a link or bringing in an amount of money.


Originally I thought about crawling with the requests module, but I stopped because I couldn't log in, and I'm glad that I switched to Selenium and it worked.

Since selenium only causes the actual operation in the code, if you know the user name, password, and the position of the login button (specified by the class and id of the html tag), you should be able to log in regardless of what is being done internally. I feel that Selenium is fine for all crawling of sites that require login.


Reference link

-[Python3] Scraping on site with login function [requests] [Beautiful Soup]

--I also referred to the following [Python3] scraping via browser (dynamic page etc.) [Selenium].

-Story of making a container to connect to daily pachinko

--Borrow the DMM login part

-I tried using Headless Chrome from Selenium

--UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead was warned, but it seems that PhantomJS has been developed. This time it worked, but I will use Chrome in the future. --I used it as a reference when changing PhantomJS to Chrome. Especially the Preparation part.

--The actual code is almost unchanged

```Python
# #Get a Phantom JS driver
# browser = webdriver.PhantomJS()
# browser.implicitly_wait(5)
# ↓

#Get a Chrome driver
options = webdriver.ChromeOptions()
options.add_argument('--headless') #Option to hide screen
browser = webdriver.Chrome(options=options)
browser.implicitly_wait(5)
```

Recommended Posts

Get a list of purchased DMM eBooks with Python + Selenium
[python] Get a list of instance variables
[Python] Get a list of folders only
Get a list of files in a folder with python without a path
Get a list of IAM users with Boto3
Get a list of packages installed in your current environment with python
Python: Get a list of methods for an object
Get a list of articles posted by users with Python 3 Qiita API v2
How to get a list of files in the same directory with python
Get the value of a specific key in a list from the dictionary type in the list with Python
Get the number of specific elements in a python list
Since Python 1.5 of Discord, I can't get a list of members
How to get a list of built-in exceptions in python
Get a list of CloudWatch Metrics and a correspondence table for Unit units with Python boto
Try to get a list of breaking news threads in Python.
I tried to create a list of prime numbers with python
Get html from element with Python selenium
Display a list of alphabets in Python 3
Get the width of the div on the server side with Selenium + PhantomJS + Python
Get a capture of the entire web page in Selenium Python VBA
Get the number of searches with a regular expression. SeleniumBasic VBA Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
Receive a list of the results of parallel processing in Python with starmap
[Python] Get the files in a folder with Python
Get a ticket for a theme park with python
Automatic operation of Chrome with Python + Selenium + pandas
[python] Create a list of various character types
Get the caller of a function in Python
Make a copy of the list in Python
Solve A ~ D of yuki coder 247 with python
[Python] Get rid of dating with regular expressions
Get CPU information of Raspberry Pi with Python
Get a list of Qiita likes by scraping
Get a glimpse of machine learning in Python
A simple to-do list created with Python + Django
A layman wants to get started with Python
Get a quick Python development environment with Poetry
Python script to get a list of input examples for the AtCoder contest
Get the stock price of a Japanese company with Python and make a graph
[Introduction to Python] How to get the index of data with a for statement
List of python modules
ScreenShot with Selenium (Python)
Get date with python
Scraping with Selenium [Python]
Initialize list with python
A memo connected to HiveServer2 of EMR with python
Recommendation of building a portable Python environment with conda
Get financial data with python (then a little tinkering)
How to write a list / dictionary type of Python3
[Python] Get the list of ExifTags names of Pillow library
Python + selenium to GW a lot of e-mail addresses
Group by consecutive elements of a list in Python
Get the operation status of JR West with Python
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
How to identify the element with the smallest number of characters in a Python list?
I quarantined my environment with virtualenv, but I get a lot of packages with pip list
After hitting the Qiita API with Python to get a list of articles for beginners, we will visit the god articles
Get country code with python
Try to get the function list of Python> os package
Scraping with selenium in Python
Scraping with Selenium + Python Part 1