Coding 10 lines a day ~ Scraping implementation ~

Overview

Continuing from yesterday, I will write 10 lines of code today as well.

the 2nd day

【theme】

Scraping

language

Target deliverable

I would like to complete the main body of scraping, which I could not complete yesterday.

Code and documentation

1. Re-execute bundle install

It was a bundle install that got stuck yesterday, but when I reinstalled xcode, it worked fine lol

$ bundle install --path .bundle

Fetching gem metadata from https://rubygems.org/.................
Resolving dependencies...
Using bundler 1.17.2
Using mini_portile2 2.4.0
Fetching nokogiri 1.10.9
Installing nokogiri 1.10.9 with native extensions
Bundle complete! 1 Gemfile dependency, 3 gems now installed.
Bundled gems are installed into `./.bundle`

2. Perth with Nokogiri

Now, let's write the processing of the perspective part using Nokogiri. This time, I just want to see the title of the animation, so I will extract the title.

I want to use nokogiri, so at the top of the file

require 'nokogiri'

I will describe here.

And in the continuation of the last time


#Parse the received HTML
doc = Nokogiri::HTML.parse(response.body, nil, nil)

#Extract necessary information from parsed information
doc.css(".l-searchPageRanking_unit_title").each{|div|
    puts div.text.split("\n")[2].gsub("              ","")
}

Write the process to parse the HTML received in this way.

3. Execute

You can execute ruby with the following command.

bundle exec ruby crawler.rb

Then

Do you want Kaguya to tell you? ~ Love brain battle of geniuses ~ (TV anime video)
Dropkick on My Devil'(dash) (TV anime video)
BNA BNA (TV anime video)
Kakushigoto (TV anime video)
I have reincarnated as a villain daughter who has only the ruin flag of the maiden game ... (TV anime video)
Wave, Listen to Me! (TV Anime Video)
Singing Yesterday (TV Anime Video)
Fruit basket 2nd season (TV anime video)
Appare-Ran! (TV anime video)
Princess Connect! Re: Dive (TV anime video)
Major 2nd 2nd series (TV anime video)
Book lover Shimogami-I can't choose the means to become a librarian-Part 2 (TV anime video)
Diary of Our Days after School (TV Anime Video)
Kingdom 3rd series (TV anime video)
Gleipnir (TV anime video)
Troublesome grandfather (TV anime video)
After all my youth romantic comedy is wrong. Complete (TV anime video)
Shokugeki no Soma Gono Dish (TV Anime Video)
Arte (TV anime video)
Millionaire Detective Balance: UNLIMITED (TV Anime Video)
Digimon Adventure: (TV Anime Video)
Re: Life in a different world starting from zero(Second stage)(TV anime video)
LISTENERS (TV anime video)
Hakushon Daimaou 2020 (TV anime video)
Tsugumomo (TV anime video)

I think the list will be displayed like this. To be honest, I feel like I'm going to put the split or gsub place a little more properly, but ... I'm okay with this once lol

However, there is one problem here, when I checked the number of lines of the code added this time, it was 9 lines even if I put a comment .... So, I tried to get the numerical value such as the evaluation number here as well. I would like to add.

4. Get ratings and comments

Let's tweak the parsing process earlier so that we can also see the number of ratings and comments.

#Extract necessary information from parsed information
doc.css(".l-searchPageRanking_unit").each{|div|
    puts "title:" + div.css(".l-searchPageRanking_unit_title")[0].text.split("\n")[2].gsub("              ","")
    puts "Evaluation:" + div.css(".l-searchPageRanking_unit_mainBlock_starPoint strong")[0].text
    puts "Number of comments:" + div.css(".l-searchPageRanking_unit_mainBlock_starPoint span")[0].text + "\n\n"
}

To briefly explain, in the previous code, only the headline was repeated, but if you want evaluation, you need to specify the tag that encloses the animation information and loop, so as above I'm messing around in front of each.

5. Run again

Execute with the following command


bundle exec ruby crawler.rb

Then

title:Do you want Kaguya to tell you? ~ Love brain battle of geniuses ~ (TV anime video)
Evaluation:3.8
Number of comments:120

title:Dropkick on My Devil'(dash) (TV anime video)
Evaluation:3.9
Number of comments:54

title:BNA BNA (TV anime video)
Evaluation:3.7
Number of comments:88

title:Kakushigoto (TV anime video)
Evaluation:3.6
Number of comments:105

title:I have reincarnated as a villain daughter who has only the ruin flag of the maiden game ... (TV anime video)
Evaluation:3.6
Number of comments:114

title:Wave, Listen to Me! (TV Anime Video)
Evaluation:3.6
Number of comments:77

title:Singing Yesterday (TV Anime Video)
Evaluation:3.8
Number of comments:115

title:Fruit basket 2nd season (TV anime video)
Evaluation:3.5
Number of comments:23

title:Appare-Ran! (TV anime video)
Evaluation:3.4
Number of comments:46

title:Princess Connect! Re: Dive (TV anime video)
Evaluation:3.5
Number of comments:55

title:Major 2nd 2nd series (TV anime video)
Evaluation:3.5
Number of comments:15

title:Book lover Shimogami-I can't choose the means to become a librarian-Part 2 (TV anime video)
Evaluation:3.3
Number of comments:41

title:Diary of Our Days after School (TV Anime Video)
Evaluation:3.4
Number of comments:53

title:Kingdom 3rd series (TV anime video)
Evaluation:3.4
Number of comments:20

title:Gleipnir (TV anime video)
Evaluation:3.5
Number of comments:74

title:Troublesome grandfather (TV anime video)
Evaluation:3.4
Number of comments:11

title:After all my youth romantic comedy is wrong. Complete (TV anime video)
Evaluation:3.3
Number of comments:23

title:Shokugeki no Soma Gono Dish (TV Anime Video)
Evaluation:3.3
Number of comments:30

title:Arte (TV anime video)
Evaluation:3.4
Number of comments:51

title:Millionaire Detective Balance: UNLIMITED (TV Anime Video)
Evaluation:3.2
Number of comments:37

title:Digimon Adventure: (TV Anime Video)
Evaluation:3.2
Number of comments:15

title:Re: Life in a different world starting from zero(Second stage)(TV anime video)
Evaluation:3.2
Number of comments:13

title:LISTENERS (TV anime video)
Evaluation:3.2
Number of comments:57

title:Hakushon Daimaou 2020 (TV anime video)
Evaluation:3.2
Number of comments:14

title:Tsugumomo (TV anime video)
Evaluation:3.1
Number of comments:26

As mentioned above, you can also get the rating and the number of comments for each animation, and you can decide which one to watch! !! !!

PS Personally, I was pushing Soma, but recently the momentum has clearly slowed down ... Then, I thought I would look at the highly rated "Kaguya" or the "Fruit Basket" that cried after the first term. I will.

I would like to publish the code I wrote today on github as well. (I don't know if it's worth it) https://github.com/itayayuichiro/anikore_crawler

Recommended Posts

Coding 10 lines a day ~ Scraping implementation ~
Coding 10 lines a day ~ Scraping environment construction ~
I want to graph the number of photo AC downloads [Scraping implementation] ~ 10 lines per day coding ~
Create a mechanism to post reservations to Qiita [Reservation posting] ~ Coding 10 lines a day ~ ~
Create a mechanism for posting reservations to Qiita [Draft acquisition] ~ Coding 10 lines a day ~ ~
I want to graph the number of photo AC downloads [MySQL ring cooperation] ~ Coding 10 lines a day ~