Easy web scraping with Python and Ruby

Web scraping-> Collecting HTML data of a website to extract and format specific data.

This time, I will introduce one of the methods of Python and Ruby respectively.

Python: BeautifulSoup4

Beautiful Soup is quite useful in Python.

Installation

pip install beautifulsoup4

How to use

import urllib2
from bs4 import BeautifulSoup

html = urllib2.urlopen("http://example.com")
# =>Of course you can also read files.

soup = BeautifulSoup(html)

#Lots of useful methods!
soup.find_all('td')
soup.find("head").find("title")
soup.find_parents()
soup.find_parent()
soup.descendants()

#It seems that you can also rename tags, change attribute values, add and delete them!
tag = soup.a
tag.string = "New link text."
tag
# => <a href="">New link text.</a>

soup = BeautifulSoup("<a>Foo</a>")
soup.a.append("Bar")
# => <a href="">FooBar</a>

I've never used Python, but it was a lot of fun to use.

Ruby: nokogiri

Installation

gem install nokogiri
source 'https://rubygems.org'
gem 'nokogiri'
bundle

How to use

charset = nil
html = open("http://example.com") do |f|
  charset = f.charset 
  f.read 
end

doc = Nokogiri::HTML.parse(html, nil, charset)

doc.title
doc.xpath('//h2 | //h3').each do |link|
  puts link.content
end
html = File.open('data.html', encoding: 'utf-8') { |file| file.read }
doc = Nokogiri::HTML.parse(html, nil) do |d|
  d.xpath('//td').each do |td|
    pp td.content
  end
end

Personally, I liked Ruby after all.

reference

Scraping with Python and Beautiful Soup-Qiita http://qiita.com/itkr/items/513318a9b5b92bd56185 kondou.com --Beautiful Soup 4.2.0 Doc. Japanese translation (2013-11-19 last updated) http://kondou.com/BS4/# Ruby scraping with Nokogiri [Tutorial for beginners] --Sake, 泪, Ruby, Rails http://morizyun.github.io/blog/ruby-nokogiri-scraping-tutorial/

Recommended Posts

Easy web scraping with Python and Ruby
Scraping with Node, Ruby and Python
Practice web scraping with Python and Selenium
Web scraping with python + JupyterLab
Easy web scraping with Scrapy
Web scraping beginner with python
Web scraping with Python ① (Scraping prior knowledge)
Scraping with Python, Selenium and Chromedriver
Easy modeling with Blender and Python
Scraping with Python
Scraping with Python
Web crawling, web scraping, character acquisition and image saving with python
Easy deep learning web app with NNC and Python + Flask
WEB scraping with Python (for personal notes)
Getting Started with Python Web Scraping Practice
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Getting Started with Python Web Scraping Practice
Easy web app with Python + Flask + Heroku
[For beginners] Try web scraping with Python
Https access via proxy with Python web scraping was easy with requests
Scraping with Python (preparation)
Try scraping with Python.
Scraping with Python + PhantomJS
Ruby, Python and map
Python and Ruby split
Scraping with Selenium [Python]
Python web scraping selenium
Scraping with Python + PyQuery
Scraping RSS with Python
Scraping tabelog with python and outputting to CSV
I tried web scraping using python and selenium
Launch a web server with Python and Flask
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
Programming with Python and Tkinter
I tried scraping with Python
Encryption and decryption with Python
Data analysis for improving POG 1 ~ Web scraping with Python ~
Scraping with selenium in Python
Python and hardware-Using RS232C with Python-
Scraping with Selenium + Python Part 1
Python on Ruby and angry Ruby on Python
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Web scraping notes in python3
Easy scraping with Python (JavaScript / Proxy / Cookie compatible version)
Scraping with chromedriver in python
Save images with web scraping
Easy partial download of mp4 with python and youtube-dl!
Python and ruby slice memo
Parse and visualize JSON (Web application ⑤ with Python + Flask)
Scraping with Selenium in Python
Web scraping technology and concerns
Quick web scraping with Python (while supporting JavaScript loading)
Web API with Python + Falcon
Python beginners get stuck with their first web scraping
Ruby and Python syntax ~ branch ~
WEB scraping with python and try to make a word cloud from reviews
Web scraping using Selenium (Python)
Scraping weather forecast with python