I tried scraping with python

This time I will use Beautiful Soup. python 3.6.0 BeautifulSoup 4.6.0

Click here for the document English http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Japanese http://kondou.com/BS4/

Installation

$ pip install beautifulsoup4

Run Torima Beautiful Soup

It is a program that fetches the data of this page and displays the contents of the h1 tag. https://pythonscraping.com/pages/page1.html

from urllib.request import urlopen
from bs4 import BeautifulSoup 
html=urlopen("https://pythonscraping.com/pages/page1.html")
bsobj=BeautifulSoup(html.read())

print(bsobj.h1)

If nothing is done, the web page will not be found, or the scraper will throw an error in an unexpected data format, so you should write exception handling.

Error countermeasures

html=urlopen("https://pythonscraping.com/pages/page1.html")

This line will result in an error if the page cannot be found So, rewrite it as follows.

try:
	html=urlopen("https://pythonscraping.com/pages/page1.html")
except: 
	print("Page Not Found")

This line can also cause an error

bsobj=BeautifulSoup(html.read())

I rewrote it like this.

try:
    bsobj=BeautifulSoup(html.read())
    print(bsobj.h1)
except:
    print("error")

Find the tag you want

You can find the tag you want by using find () and findAll () The following code displays the text in `<span class =" green "> </ span>`

span_list = bsobj.findAll("span",{"class":"green"})

If you want to display not only class = "green" but also class = "red", rewrite as follows.

span_list = bsobj.findAll("span",{"class":{"red","green"}})

Remove the tag

span_list = bsobj.findAll("span",{"class":"green"})
for i in span_list:
    print(i)

This code will display the text ``` </ span>` ``, but the tags will also be displayed. If you want only the text inside, you need to rewrite it as follows

#Display tags together
print(i)

#Display without tags
print(i.get_text)

Recommended Posts

I tried scraping with Python
I tried scraping with python
I tried web scraping with python.
I tried fp-growth with python
I tried gRPC with Python
Scraping with Python
Scraping with Python
I tried scraping
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python
Scraping with Python (preparation)
Try scraping with Python.
Scraping with Python + PhantomJS
I tried Python> autopep8
I tried sending an email with python.
I tried non-photorealistic rendering with Python + opencv
I tried a functional language with Python
Scraping with Selenium [Python]
I tried scraping Yahoo weather (Python edition)
Scraping with Python + PyQuery
I tried Python> decorator
Scraping RSS with Python
#I tried something like Vlookup with Python # 2
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried "smoothing" the image with Python + OpenCV
I tried hundreds of millions of SQLite with python
I tried web scraping using python and selenium
I tried "differentiating" the image with Python + OpenCV
I tried L-Chika with Raspberry Pi 4 (Python edition)
I tried Jacobian and partial differential with python
I tried to get CloudWatch data with Python
I tried using mecab with python2.7, ruby2.3, php7
I tried function synthesis and curry with python
I tried "binarizing" the image with Python + OpenCV
I tried running faiss with python, Go, Rust
I tried running Deep Floor Plan with Python 3.6.10.
I tried sending an email with SendGrid + Python
Web scraping with python + JupyterLab
Scraping with Selenium + Python Part 1
Scraping with chromedriver in python
Festive scraping with Python, scrapy
I tried Learning-to-Rank with Elasticsearch!
I made blackjack with python!
I tried clustering with PyCaret
Scraping with Selenium in Python
Scraping with Tor in Python
I tried Python C extension
[Python] I tried using OpenPose
Scraping with Selenium + Python Part 2
I made blackjack with Python.
Web scraping beginner with python
I made wordcloud with Python.
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried to solve the soma cube with python
I tried to get started with blender python script_Part 02
I was addicted to scraping with Selenium (+ Python) in 2020
I tried to implement an artificial perceptron with python