[Python] A memorandum of beautiful soup4

Introduction

A memorandum of html tag search by beautifulsoup4.

environment

Basic search

#All p tags
find_all("p")

#Only the first p tag found
find("p")

#a tag and href starting with hogehoge
import re
find_all("a", href=re.compile("^hogehoge"))

Search using css selector

#Specify parent-child relationship, loosely
select('body div p')

#Parent-child relationship # 2, strict
select('body > div > p')

#name of the class
select('.myclass')

#id name
select('#myid')

#AND condition
select('.myclass1.myclass2')

nth tag

#The third of the html below<li>Search for tags
# <html>
# <body>
#   <ul>
#     <li>Not specified</li>
#     <li>Not specified</li>
#     <li>It is specified</li>
#     <li>Not specified</li>
#   </ul>
# </body>
# </html>

select('body > ul > li:nth-of-type(3)')

What to do when nth-of-type () does not work

The reason why it didn't work was that the html of the scraping source site had a start tag but no close tag. The solution is to remove the start tag. (By the way, the closing tag existed on Chrome's developer tools, so I didn't notice it until I looked at the source of the page ...)

url = "http://hogehoge/"
soup = BeautifulSoup(url.text, "lxml")

#Remove the dd tag because there is no closing tag for the dd tag
for tag in soup.find_all('dd'):
  tag.unwrap()

Remove all <dd> tags. However, if you use .decompose (), the elements after <dd> will also disappear, so delete only the tag with .unwrap ().

References

Recommended Posts

[Python] A memorandum of beautiful soup4
A memorandum when using beautiful soup
[Python3] Understand the basics of Beautiful Soup
[Python] Scraping a table using Beautiful Soup
A memorandum of python string deletion process
My Beautiful Soup (Python)
A memorandum of calling Python from Common Lisp
A memorandum of extraction by python bs4 request
A memorandum of kernel compilation
A small memorandum of openpyxl
A memorandum about correlation [Python]
A memorandum about Python mock
A memorandum of using eigen3
[Python] Delete by specifying a tag with Beautiful Soup
A record of patching a python package
Try scraping with Python + Beautiful Soup
A memorandum of stumbling on my personal HEROKU & Python (Flask)
Beautiful Soup
Python Memorandum 2
A brief summary of Python collections
Scraping with Python and Beautiful Soup
A memorandum of files under conf.d
Python memorandum
python memorandum
python memorandum
Python memorandum
python memorandum
Memorandum of beginners Python "isdigit" movement
Python memorandum
A memorandum of closure survey contents
A memorandum of understanding for the Python package management tool ez_setup
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
A memorandum regarding the acquisition of the Python3 engineer certification basic exam
Display a list of alphabets in Python 3
A memorandum of using Python's input function
A memorandum of speed of arbitrary degree diagonalization
Make a relation diagram of Python module
Memorandum of python beginners About inclusion notation
Connect a lot of Python or and and
A memorandum of understanding about django's QueryDict
[python] Get a list of instance variables
[python] [meta] Is the type of python a type?
The story of blackjack A processing (python)
[Python] Get a list of folders only
A memorandum of trouble when formatting data
Introduction of Python
Python basics memorandum
Python pathlib memorandum
Memorandum of sed
Python memorandum (algorithm)
Beautiful Soup memo
Basics of Python ①
Basics of python ①
Write a basic headless web scraping "bot" in Python with Beautiful Soup 4
Copy of python
Python memorandum [links]
Introduction of Python
A memo of a tutorial on running python on heroku
[AtCoder] Solve A problem of ABC101 ~ 169 with Python
Draw a graph of a quadratic function in Python