I want to graph the number of photo AC downloads [Scraping implementation] ~ 10 lines per day coding ~

Third day

This time, I will make something that graphs the number of photo downloads.

** Details ** There is a copyright-free image posting site called Photo AC, and the number of downloads the day before is displayed there, but the next day I can not see the number of downloads two days ago, so once a day I would like to get it, put it in the DB, and finally make a graph. https://www.photo-ac.com/ (If your account is closed by this, please do it at your own risk lol)

[Technical theme]

Scraping

language

Target deliverable

For now, I would like to implement the part that gets information from the site once.

Code and documentation

1. Get the required information from html

Unlike the last time, this time I will use Mechanize because it involves login.

source 'https://rubygems.org/'

gem 'nokogiri'
gem 'mechanize'

In this state

bundle install --path .bundle

The reason why you specify the path with --path is that if you do not specify it, it will be reflected in all the local environment and it will be troublesome later.

Click here for the code of the part that is actually acquired

crawler.rb


require 'nokogiri'
require 'mechanize'

agent = Mechanize.new
agent.get("https://www.photo-ac.com")

agent.post("https://www.photo-ac.com/auth/login",{
  acc_type: 'cr',
  email: 'mail address',
  password: 'password',
  remember_me: '1'
})

page = agent.get("https://www.photo-ac.com/creator/list/?pl_q=&pl_order=-releasedate&pl_pp=200&pl_disp=all&pl_ntagsec=&pl_tags50over=&pl_chkpsd=")
doc = Nokogiri::HTML.parse(page.body, nil, 'utf-8')
doc.css(".photo-list").each{|div|
  p div.css(".sectiondata li")[0].text
  p div.css(".sectionimg .preview")[0].text
}

When you run this

"ID:2875969"
"0"
"ID:2875964"
"0"
"ID:2875028"
"0"
"ID:2875022"
"0"
"ID:2874964"
"0"
"ID:2871884"
"0"
"ID:2871883"
"0"
"ID:2871879"
"0"
"ID:2871873"
"0"
"ID:2871870"
"0"
"ID:2837286"
"0"
"ID:2837285"
"0"
"ID:2837282"
"0"
"ID:2837281"
"0"
"ID:2837280"
"0"
"ID:2837277"
"0"
"ID:2837276"
"0"
"ID:2836745"
"0"
"ID:2836741"
"3"
"ID:2836737"
"1"
"ID:2836735"
"2"
"ID:2836730"
"1"
"ID:2836723"
"0"
"ID:2836718"
"1"
"ID:2746521"
"6"
"ID:2746517"
"11"
"ID:2746513"
"1"
"ID:2746505"
"1"
"ID:2746086"
"1"
"ID:2746084"
"4"
"ID:2746070"
"15"
"ID:2746066"
"16"
"ID:2742664"
"10"
"ID:2742530"
"17"
"ID:2742522"
"6"
"ID:2742517"
"3"
"ID:2741719"
"4"
"ID:2741715"
"16"
"ID:2741708"
"2"
"ID:2741705"
"0"
"ID:2741700"
"0"
"ID:2741699"
"0"
"ID:2741675"
"21"
"ID:2741674"
"2"
"ID:2741653"
"0"
"ID:2741629"
"1"
"ID:2741567"
"0"
"ID:2741381"
"22"
"ID:2741336"
"7"
"ID:2733068"
"14"
"ID:2733060"
"0"
"ID:2733050"
"1"
"ID:2690326"
"2"
"ID:2690291"
"7"
"ID:2690259"
"1"

As mentioned above, you can see the ID of the image and the current number of downloads. If you put this in the DB once a day and store the data, it seems easy to graph it.

Recommended Posts

I want to graph the number of photo AC downloads [Scraping implementation] ~ 10 lines per day coding ~
I want to graph the number of photo AC downloads [MySQL ring cooperation] ~ Coding 10 lines a day ~
I want to output the day of the week
Coding 10 lines a day ~ Scraping implementation ~
I want to display the number of orders for today using datetime.
I want to var_dump the contents of the intent
I want to know the answer of the rock-paper-scissors app
I want to display the name of the poster of the comment
I want to reduce the number of unnecessary queries. From considering counter_cache to introducing counter_culture.
I want to be aware of the contents of variables!
I want to return the scroll position of UITableView!
I want to expand the clickable part of the link_to method
I want to change the log output settings of UtilLoggingJdbcLogger
I want to call a method and count the number
I want to narrow down the display of docker ps
[Ruby] I want to reverse the order of the hash table
I want to temporarily disable the swipe gesture of UIPageViewController
I want to understand the flow of Spring processing request parameters
The story of Collectors.groupingBy that I want to keep for posterity
I want to limit the input by narrowing the range of numbers
I want to control the default error message of Spring Boot
I want to change the value of Attribute in Selenium of Ruby
I want to know the JSP of the open portlet when developing Liferay
[Ruby] I want to extract only the value of the hash and only the key
Program to find the number of days per month including leap years
I want to pass the argument of Annotation and the argument of the calling method to aspect
I want to get the field name of the [Java] field. (Old tale tone)
I want you to use Enum # name () for the Key of SharedPreference
[day: 5] I summarized the basics of Java
Coding 10 lines a day ~ Scraping environment construction ~
[RxSwift] I want to deepen my understanding by following the definition of Observable
I want to get the value of Cell transparently regardless of CellType (Apache POI)
I want to control the start / stop of servers and databases with Alexa
I want to separate the handling of call results according to the API caller (call trigger)
I want to see the contents of Request without saying four or five
I want to recursively get the superclass and interface of a certain class