Coding 10 lines a day ~ Scraping environment construction ~

Overview

From now on, I would like to post about 10 lines of source code every day.

First day

【theme】

Scraping

language

Target deliverable

Since it's the first day, I'm going to write the code that does the most familiar scraping without being overly enthusiastic. When I came up with it, I wanted to know the recommended anime for this season. https://www.anikore.jp/ I would like to get the top anime of this season from this site. (Don't tell me to look it up ...)

Code and documentation

  1. First of all, I will jump to this season's list page.
  2. Looking at http communication, it seems that HTML is falling normally instead of API, so copy that communication with curl.
  3. Use the god site curl-to-ruby to convert the previous curl to ruby code.
  4. The result is here (file name crawler.rb)

crawler.rb


require 'net/http'
require 'uri'

uri = URI.parse("https://www.anikore.jp/chronicle/2020/spring/ac:tv/")
request = Net::HTTP::Get.new(uri)
request["Authority"] = "www.anikore.jp"
request["Cache-Control"] = "max-age=0"
request["Upgrade-Insecure-Requests"] = "1"
request["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36"
request["Sec-Fetch-Dest"] = "document"
request["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
request["Sec-Fetch-Site"] = "same-origin"
request["Sec-Fetch-Mode"] = "navigate"
request["Sec-Fetch-User"] = "?1"
request["Referer"] = "https://www.anikore.jp/"
request["Accept-Language"] = "ja,en-US;q=0.9,en;q=0.8,zh-TW;q=0.7,zh;q=0.6"
request["Cookie"] = "anikore=vr4e4jp9u83qpe76nb5jf2dm35; optimizelyEndUserId=oeu1591020303990r0.9278880352532264; optimizelySegments=%7B%225639900384%22%3A%22gc%22%2C%225644680362%22%3A%22direct%22%2C%225653460252%22%3A%22false%22%7D; optimizelyBuckets=%7B%7D; _ga=GA1.2.1594135381.1591020306; __gads=ID=8dec67eec678ab98:T=1591020306:S=ALNI_Mam9k84TCb2IJVyBUucjbUoYYIgsQ; _gid=GA1.2.1570502140.1591280281; _gali=page-top; _gat=1"

req_options = {
  use_ssl: uri.scheme == "https",
}

response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
  http.request(request)
end

At this point it was slightly over 10 lines ...

  1. I want to parse this HTML file with Ruby's nokogiri, so prepare a "Gemfile" and write the following.
source 'https://rubygems.org'

gem 'nokogiri'

In this state, start the terminal

bundle install --path .bundle

To execute.

  1. Rainy day error When I tried to put Nogiri for the first time in a long time, I was usually angry ... Not everything in the world goes the way you want.
Fetching gem metadata from https://rubygems.org/.................
Resolving dependencies...
Using bundler 1.17.2
Fetching mini_portile2 2.4.0
Installing mini_portile2 2.4.0
Fetching nokogiri 1.10.9
Installing nokogiri 1.10.9 with native extensions
Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

    current directory: /Users/itayayuichiro/Documents/src/crawler/qiita_crawler/.bundle/ruby/2.6.0/gems/nokogiri-1.10.9/ext/nokogiri
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/bin/ruby -I /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0 -r
./siteconf20200604-4813-yjwd2f.rb extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/include/ruby.h

You might have to install separate package for the ruby development
environment, ruby-dev or ruby-devel for example.

extconf failed, exit code 1

Gem files will remain installed in /Users/itayayuichiro/Documents/src/crawler/qiita_crawler/.bundle/ruby/2.6.0/gems/nokogiri-1.10.9 for inspection.
Results logged to /Users/itayayuichiro/Documents/src/crawler/qiita_crawler/.bundle/ruby/2.6.0/extensions/universal-darwin-19/2.6.0/nokogiri-1.10.9/gem_make.out

An error occurred while installing nokogiri (1.10.9), and Bundler cannot continue.
Make sure that `gem install nokogiri -v '1.10.9' --source 'https://rubygems.org/'` succeeds before bundling.

In Gemfile:
  nokogiri

For the time being this site Try to execute the following 4 commands with reference to.

$ brew tap homebrew/dupes
$ brew install libxml2 libxslt libiconv 
$ brew link --force libxml2                    ##--force--Install with options
$ brew link --force libxslt                      ##--force--Install with options

I thought, but when I executed the first line, I got an error ...

Updating Homebrew...
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

However, since the subsequent processing is finished, I will ignore it for the time being and proceed. → It seems that it was useless if I typed the next command

xcode-select --install

Run this

It will take some time to install, so I will continue tomorrow .... lol

Recommended Posts

Coding 10 lines a day ~ Scraping environment construction ~
Coding 10 lines a day ~ Scraping implementation ~
Make a language! (JavaCC environment construction)
Muscle Ruby on Rails Day 1 ~ Environment Construction ~
Create a mechanism to post reservations to Qiita [Reservation posting] ~ Coding 10 lines a day ~ ~
[Java] Environment construction
Java environment construction
Create a mechanism for posting reservations to Qiita [Draft acquisition] ~ Coding 10 lines a day ~ ~
A reminder of Docker and development environment construction
[Spring] Environment construction
Docker environment construction
[Environment construction] Build a Java development environment with VS Code!
Penronse environment construction [Windows]
[Environment construction] Eclipse installation
[Flutter] Ubuntu 20.04 environment construction
Rails Docker environment construction
Circle CI environment construction
java development environment construction
Create a Java and JavaScript team development environment (gradle environment construction)
I want to graph the number of photo AC downloads [Scraping implementation] ~ 10 lines per day coding ~