Hello! This time, I will analyze RSS news with Nokogiri and summarize how to handle it with Ruby.
In this summary, we will use a gem called Nokogiri to handle RSS news in Ruby. If you can handle RSS news in Ruby, you can make your own curated media, so let's do it.
I will automatically fetch the title part of the news from RSS that distributes the following game information and put it in the array. https://automaton-media.com/feed/
For the time being, I will make a simple one and put it together. (If you feel like it, let's write a big article like Make curation media with Rails!)
Before dealing with RSS news data, let's take a brief look at XML.
The part written as ``` <? Xml version =" 1.0 "~` `` at the beginning indicates that this file is an XML file, and it must be described at the beginning.
The block following the XML declaration that begins with ```
The channel logo etc. are set.
First, let's install Nokogiri.
gem install nokogiri
For use with Rails, add nokogiri to your Gemfile.
gem 'nokogiri'
After adding it to your Gemfile, bundle install it.
bundle install
Now that Nokogiri is installed, let's actually create a program.
First, create a file called nokogiri.rb and add the following two lines at the beginning.
nokogiri.rb
require 'open-uri' #I want to use the open method that can get the URL data by passing the URL as an argument, so load it.
require 'nokogiri' #The data fetched by the open method is read for handling by nokogiri.
nokogiri.rb
require 'open-uri'
require 'nokogiri'
url = 'https://automaton-media.com/feed/' #Set the news to be read this time.
charset = nil #Set it to nil and reset it so that the loaded news will not be garbled.
titles = open(url) do |file| #Get the data with the open method and pass it to the block for manipulation.
charset = file.charset #Set the charset of the read file in charset.
end
nokogiri.rb
require 'open-uri'
require 'nokogiri'
url = 'https://automaton-media.com/feed/'
charset = nil
titles = open(url) do |file|
charset = file.charset
doc = Nokogiri::XML(file) #Make the file fetched by the open method an object of Nokogiri.
channel = doc.at_xpath('//channel') #Gets the channel part in the file.
title = channel.xpath('//title') #Get all titles in the channel.
title.map { |title| title.text } #Collect only the text part from the NodeSet of title into an array.
end
puts titles #Let's output the title.
** Nokogiri method description **
--at_xpath Returns the first element that matches the specified xpath. (The element is called Node) --xpath Returns all elements that match the specified xpath. (The element is called NodeSet)
Let's execute the created file.
ruby nokogiri.rb
Could you output the news titles arranged in an array as shown below?
Nokogiri's search method can be googled each time according to the requirements of the news you want to extract, but the methods that are often used are summarized below.
at
doc.at('//title') #Returns the first search hit Node.
at_xpath
doc.at_xpath('//title') #Search by xpath and return the first hit Node.
xpath
doc.xpath('//title') #Returns a NodeSet that hits a search in xpath.
at_css
doc.at_css('title') #Search by css and return the first hit Node.
css
doc.css('title') #Returns a NodeSet that hits the search in css.
This time, I collected RSS news titles in an array with Ruby, but if possible, I think that I can put it in the DB, notify Slack and LINE, and so on. It may be interesting to create a summary site for yourself.