Right-click on the page source code to see the page source instead

Use the one displayed by the developer tool

<dt>price<span class="tax">(tax included)</span></dt>
To extract the text of the span tag embedded in the dt tag like
source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
soup.text
And .text can be extracted by specifying
<dt>
price
<span class="tax">(tax included)</span>
</dt>
When there is a white space in the tag such as
def remove_whitespace(str):
return ''.join(str.split())
source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
remove_whitespace(soup.text)
Can be taken out
Since the space in the center cannot be deleted with strip (), the space character is used as the delimiter with split ().
Join with .join
soup.find(class_='hoge')
soup.find_all(class_='hoge')
soup.find(id='hoge')
soup.find_all(id='hoge')
soup.find('hoge')
soup.find_all('hoge')
They can also have multiple conditions at the same time
soup.find('hoge',class_='fuga)
Recommended Posts