Here are 5 errors that often occur when using the tool "Selenium" </ b> that is useful for web scraping.
Selenium is a gem that is useful for web scraping with complex operations.
A gem that can be used for web scraping in Ruby </ b>
--Nokogiri (basic) --Mechanize (Easy) --Selenium (You can do various things by operating the browser)
In Nokogiri and Mechanize, HTML tags and CSS are specified to get the content of the page. With Selenium, you can open a browser programmatically and operate it in a simulated manner, so you can perform complicated processing.
--If you need to log in --If you are drawing a page with JavaScript --If you want to enter data
This is useful when you cannot get the necessary information using only HTML tags and CSS selectors.
Various people have summarized how to use Selenium, so please have a look there.
[Ruby] Web scraping with Chrome on selenium
#Install the tools needed to use Selenium
require 'selenium-webdriver'
#Start Selenium
driver = Selenium::WebDriver.for :chrome
Selenium::WebDriver::Error::SessionNotCreatedError (session not created: This version of ChromeDriver only supports Chrome version 75)
In Selenium, you specify the browser type and move the mouse programmatically.
#Start selenium
driver = Selenium::WebDriver.for :chrome
If you want to use Chrome, install the same version of Chrome driver as Chrome. You need to have the browser you want to use, such as Firefox or Chrome, installed in your development environment.
How to check the version of Google Chrome </ b> Check the version from Google Chrome Settings> Help> About Google Chrome
How to install Chromedriver </ b> [Ruby] Web scraping with Chrome on selenium
If you want to deploy and use it on Heroku, you need to install Google Chrome and Chrome driver on Heroku as well. You also need to add an option to run Chrome headless. Keep Chrome and Chrome driver on Heroku
[Free] Chrome headless + selenium regularly run on heroku
Take a screenshot of a web page using headless chrome on Heroku's rails app
Selenium::WebDriver::Error::InvalidArgumentError (invalid argument: 'url' must be a string)
#Open URL
@url = 'https://www...'
driver.get(@url)
driver.get("#{@url)")
Is it useless if the inside of () is a ruby variable? I was angry to make it a character string.
#Open URL
driver.get('https://www...')
Selenium::WebDriver::Error::NoSuchElementError: no such element: Unable to locate element: {"method":"id","selector":"#entryBtn"}
Error that occurs when there is no element corresponding to HTML / CSS. Check if you can specify the class name etc. properly.
Specify the element you want to get with the CSS selector.
#Install the tools needed to use nokogiri
require 'nokogiri'
require 'open-uri'
#Get page elements using Nokogiri
html = Nokogiri::HTML(open('https://www.google.co.jp/'))
logo = html.css('#hplogo')
Specify the element you want to get by element type + element name.
#Install the tools needed to use selenium
require 'selenium-webdriver'
#Start Selenium and get page elements
driver = Selenium::WebDriver.for :chrome
driver.find_element(:id, 'hplogo')
With selenium, you don't need "#".
Selenium::WebDriver::Error::StaleElementReferenceError (stale element reference: element is not attached to the page document)
An error that occurred when trying to operate on an element that was on the previous page by browser back.
#Install the tools needed to use selenium
require 'selenium-webdriver'
#Start Selenium
driver = Selenium::WebDriver.for :chrome
#Get the URL of the event details page from the event list page
events = driver.find_elements(:class, 'eventItem')
#Go to the event details page
for i in 0..events.size()-1
#Click the button to the event details page
events.find_element(:class, 'entryBtn').click()
#→ An error occurs in the second and subsequent loop processing
#Go back to the previous page
driver.navigate.back
end
#Install the tools needed to use selenium
require 'selenium-webdriver'
#Start Selenium
driver = Selenium::WebDriver.for :chrome
#Get the URL of the event details page from the event list page
events = driver.find_elements(:class, 'eventItem')
#Go to the event details page
for i in 0..events.size()-1
#Since the driver has disappeared during the second and subsequent loop processing, specify the driver again.
events_in_loop = driver.find_elements(:class, 'prfItem')
#Click the button to the event details page
events_in_loop[i].find_element(:class, 'entryBtn').click()
#Go back to the previous page
driver.navigate.back
end
If you are in a loop, the driver will not be valid, so you need to get the driver again in the loop.
reference get StaleElementReferenceException error while using driver.navigate().back() in a loop in selenium
It is an environment construction error that feels like "finished".
`require': incompatible library version - /Users/cathy/Desktop/work/vagrant/Test/vendor/bundle/ruby/2.5.0/gems/pg-0.19.0/lib/pg_ext.bundle (LoadError)
When I uninstalled gem, I got another error and couldn't get out ...
/Users/cathy/.rbenv/versions/2.5.1/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- rubygems/core_ext/kernel_warn (LoadError)
It seems that the cause is that there is a description of require'selenium-driver' even though the Selenium gem has disappeared?
gem file cannot load such file
I couldn't solve it even if I tried various things, so after all, I managed to get it to work by raising the version of Ruby.
I stumbled when I upgraded Ruby
Recommended Posts