[PYTHON] Delete / replace specific elements of HTML source [Beautiful Soup]

How to remove or replace elements that meet specific conditions in HTML scraping

(* For example, I want to skip all links, skip charts, etc.)

Use the .extract () and .replace_with () functions in Python BeautifulSoup.

from bs4 import BeautifulSoup

txt = """<p>I have a dog.  His name is <span class="secret">Ken</span>.</p>"""
soup = BeautifulSoup(txt)

# This keeps "unwanted" information
soup.get_text()
#: u'I have a dog.  His name is Ken.'


# remove an element by tag matching 
soup.find("span", {"class":"secret"}).extract()
soup.get_text()
#: u'I have a dog.  His name is .'


# or you can replace that with something
soup = BeautifulSoup(txt)
soup.find("span", {"class":"secret"}).replace_with("confidential")
soup.get_text()
#: u'I have a dog.  His name is confidential.'

Recommended Posts

Delete / replace specific elements of HTML source [Beautiful Soup]
[Python] A memorandum of beautiful soup4
Remove unwanted HTML tags with Beautiful Soup
Beautiful Soup
Frequently used methods of Selenium and Beautiful Soup