Precautions when scraping

Right-click on the page source code to see the page source instead スクリーンショット 2017-03-10 14.30.22.png

Use the one displayed by the developer tool スクリーンショット 2017-03-10 14.30.39.png

Extract text

<dt>price<span class="tax">(tax included)</span></dt>

To extract the text of the span tag embedded in the dt tag like

source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
soup.text

And .text can be extracted by specifying

Remove whitespace

<dt>
price
    <span class="tax">(tax included)</span>
</dt>

When there is a white space in the tag such as

def remove_whitespace(str):
    return ''.join(str.split())

source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
remove_whitespace(soup.text)

Can be taken out

Since the space in the center cannot be deleted with strip (), the space character is used as the delimiter with split (). Join with .join

Find at Beautiful Soup

If you want to find a particular class

In one case

soup.find(class_='hoge')

When searching all

soup.find_all(class_='hoge')

If you want to find a specific id

In one case

soup.find(id='hoge')

When searching all

soup.find_all(id='hoge')

If you want to find a specific tag

In one case

soup.find('hoge')

When searching all

soup.find_all('hoge')

They can also have multiple conditions at the same time

soup.find('hoge',class_='fuga)

Recommended Posts

[Personal note] Web page scraping with python3

WEB scraping with Python (for personal notes)

Web scraping beginner with python

Web scraping with Python ① (Scraping prior knowledge)

Web scraping with BeautifulSoup4 (layered page)

Web scraping with Python First step

I tried web scraping with python.

Getting Started with Python Web Scraping Practice

Web scraping with Python ② (Actually scraping stock sites)

Horse Racing Site Web Scraping with Python

Getting Started with Python Web Scraping Practice

Practice web scraping with Python and Selenium

Easy web scraping with Python and Ruby

Web scraping with BeautifulSoup4 (serial number page)

[For beginners] Try web scraping with Python

Scraping with Python (preparation)

Try scraping with Python.

Scraping with Python + PhantomJS

Scraping with Selenium [Python]

Python web scraping selenium

Scraping with Python + PyQuery

Scraping RSS with Python

AWS-Perform web scraping regularly with Lambda + Python + Cron

[python] Quickly fetch web page metadata with lassie

Let's do web scraping with Python (weather forecast)

Let's do web scraping with Python (stock price)

Extract data from a web page with Python

I tried scraping with Python

Data analysis for improving POG 1 ~ Web scraping with Python ~

Scraping with selenium in Python

Web scraping notes in python3

Scraping with chromedriver in python

Festive scraping with Python, scrapy

Save images with web scraping

Scraping with Selenium in Python

[Note] Operate MongoDB with Python

Quick web scraping with Python (while supporting JavaScript loading)

Easy web scraping with Scrapy

Scraping with Tor in Python

Web API with Python + Falcon

Python beginners get stuck with their first web scraping

Web scraping using Selenium (Python)

Scraping weather forecast with python

Scraping with Selenium + Python Part 2

[AtCoder] ABC165C Personal Note [Python]

Web application with Python + Flask ② ③

I tried scraping with python

I-town page scraping with selenium

Streamline web search with python

Web application with Python + Flask ④

[For beginners] Web scraping with Python "Access the URL in the page to get the contents"

[Part.2] Crawling with Python! Click the web page to move!

Web crawling, web scraping, character acquisition and image saving with python

Try scraping with Python + Beautiful Soup

Scraping with Selenium in Python (Basic)

Scraping with Python, Selenium and Chromedriver

Getting Started with Python Web Applications

Scraping Alexa's web rank with pyQuery

Scraping with Python and Beautiful Soup

Monitor Python web apps with Prometheus

Get web screen capture with python