Below are 10 ways to write CSS selectors. Every CSS sector pulls an element of ʻEurasia from
continents.html`.
<ul id="continents">
<li id="au">Australia</li>
<li id="na">NorthAmerica</li>
<li id="sa">SouthAmerica</li>
<li id="ea">Eurasia</li>
<li id="af">Africa</li>
</ul>
from bs4 import BeautifulSoup
fp = open("continents.html", encoding="utf-8")
soup = BeautifulSoup(fp, 'html.parser')
sel = lambda q: print(soup.select_one(q).string)
sel("#ea") # (1)
sel("li#ea") # (2)
sel("ul > li#ea") # (3)
sel("#continents #ea") # (4)
sel("#continents > #ea") # (5)
sel("ul#continents >li#ea") # (6)
sel("li[id='ea']") # (7)
sel("li:nth-of-type(4)") # (8)
print(soup.select("li")[3].string) # (9)
print(soup.find_all("li")[3].string) # (10)
(1) Extract the element whose id attribute is ʻea (2) Extract the element with the
tag and the id attribute of ʻea
.
(3) Extract (2) by specifying it from the upper <ul>
tag.
(4) Extract the child element with id attribute ʻeain the hierarchy below the element with id attribute
continents (5) Extract the child element with id attribute ʻea
in the hierarchy directly below the element with id attribute continents
(6) Extract the elements of the <ul>
tag whose id attribute is continents
and the<li>
tag whose id attribute is ʻeaimmediately below it. (7) Extract the element of the
tag whose id attribute is ʻea
(8) Extract the element of the 4th <li>
tag
(9) Use select ()
to extract the<li>
tag and get the element of that[3]
(3 counting from 0, that is, the 4th)
(10) Use find_all ()
to extract the<li>
tag and get the element of that[3]
(3 counting from 0, that is, the 4th)
Execution result
Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia
Here is a summary of the functions used for scraping.
find ()
method, find_all ()
methodExtract the element by specifying an arbitrary attribute. The find ()
method can get one element, and the find_all ()
method can get multiple elements at once.
Example of use
title = soup.find (id = "title") # Get the element whose id attribute is title
linls = soup.find_all ("a") # Get all elements tagged
select ()
method, select_all ()
method
Specify the selector by the argument and get the element. The select ()
method can get one element, and the select_all ()
method can get multiple elements. The usage example is as in sel-continents.py
above.I understand how to scrape, but I often stop understanding python grammar, so I want to keep in mind the underlying purpose of understanding python grammar.
Recommended Posts