[Personal note] Web page scraping with python3

Precautions when scraping

Right-click on the page source code to see the page source instead スクリーンショット 2017-03-10 14.30.22.png

Use the one displayed by the developer tool スクリーンショット 2017-03-10 14.30.39.png

Extract text

<dt>price<span class="tax">(tax included)</span></dt>

To extract the text of the span tag embedded in the dt tag like

source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
soup.text

And .text can be extracted by specifying

Remove whitespace

<dt>
price
    <span class="tax">(tax included)</span>
</dt>

When there is a white space in the tag such as

def remove_whitespace(str):
    return ''.join(str.split())

source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
remove_whitespace(soup.text)

Can be taken out

Since the space in the center cannot be deleted with strip (), the space character is used as the delimiter with split (). Join with .join

Find at Beautiful Soup

If you want to find a particular class

In one case

soup.find(class_='hoge')

When searching all

soup.find_all(class_='hoge')

If you want to find a specific id

In one case

soup.find(id='hoge')

When searching all

soup.find_all(id='hoge')

If you want to find a specific tag

In one case

soup.find('hoge')

When searching all

soup.find_all('hoge')

They can also have multiple conditions at the same time

soup.find('hoge',class_='fuga)

Recommended Posts

[Personal note] Web page scraping with python3
WEB scraping with Python (for personal notes)
Web scraping beginner with python
Web scraping with Python ① (Scraping prior knowledge)
Web scraping with BeautifulSoup4 (layered page)
Web scraping with Python First step
I tried web scraping with python.
Getting Started with Python Web Scraping Practice
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Getting Started with Python Web Scraping Practice
Practice web scraping with Python and Selenium
Easy web scraping with Python and Ruby
Web scraping with BeautifulSoup4 (serial number page)
[For beginners] Try web scraping with Python
Scraping with Python (preparation)
Try scraping with Python.
Scraping with Python + PhantomJS
Scraping with Selenium [Python]
Python web scraping selenium
Scraping with Python + PyQuery
Scraping RSS with Python
AWS-Perform web scraping regularly with Lambda + Python + Cron
[python] Quickly fetch web page metadata with lassie
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
Extract data from a web page with Python
I tried scraping with Python
Data analysis for improving POG 1 ~ Web scraping with Python ~
Scraping with selenium in Python
Web scraping notes in python3
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Save images with web scraping
Scraping with Selenium in Python
[Note] Operate MongoDB with Python
Quick web scraping with Python (while supporting JavaScript loading)
Easy web scraping with Scrapy
Scraping with Tor in Python
Web API with Python + Falcon
Python beginners get stuck with their first web scraping
Web scraping using Selenium (Python)
Scraping weather forecast with python
Scraping with Selenium + Python Part 2
[AtCoder] ABC165C Personal Note [Python]
Web application with Python + Flask ② ③
I tried scraping with python
I-town page scraping with selenium
Streamline web search with python
Web application with Python + Flask ④
[For beginners] Web scraping with Python "Access the URL in the page to get the contents"
[Part.2] Crawling with Python! Click the web page to move!
Web crawling, web scraping, character acquisition and image saving with python
Try scraping with Python + Beautiful Soup
Scraping with Selenium in Python (Basic)
Scraping with Python, Selenium and Chromedriver
Getting Started with Python Web Applications
Scraping Alexa's web rank with pyQuery
Scraping with Python and Beautiful Soup
Monitor Python web apps with Prometheus
Get web screen capture with python