[PYTHON] Remove unwanted HTML tags with Beautiful Soup

I used Beautiful Soup for the first time

I decided to scrape for some reason in my work, so I hurriedly tried using it.


import urllib.request
import bs4

url = 'http://www.XXXXXX.jp'

html = urllib.request.urlopen(url)
soup = bs4.BeautifulSoup(html, 'html.parser')

title = soup.select('.lxl-inCateList ul li a dl dt')
price = soup.find_all("dd", class_="l-price")

for i in title:
    a = (i.string)
    print (a)
for i in price:
    b = (i.string)
    print (b)

It's a source that doesn't look beautiful,

a = (i.string)

By doing so, unnecessary HTML tags could be deleted.

soup.find_all("dd", class_="l-price")

It's really convenient to be able to go to see classes and so on. I wish I knew earlier ... With a sudden need, the task of "collecting this and this from the site into a document" becomes easier at once.

Recommended Posts

Remove unwanted HTML tags with Beautiful Soup
Scraping with Beautiful Soup
Note that I dealt with HTML in Beautiful Soup
Table scraping with Beautiful Soup
Crawl practice with Beautiful Soup
Try scraping with Python + Beautiful Soup
Scraping multiple pages with Beautiful Soup
Scraping pages with pagination with Beautiful Soup
Scraping with Beautiful Soup in 10 minutes
Website scraping with Python's Beautiful Soup
Beautiful Soup
Beautiful Soup memo
Beautiful soup spills
How to search HTML data using Beautiful Soup
Delete / replace specific elements of HTML source [Beautiful Soup]
My Beautiful Soup (Python)