content of study

How to write CSS selectors
Scraping with Beautiful Soup

How to write CSS selectors

Below are 10 ways to write CSS selectors. Every CSS sector pulls an element of ʻEurasia from continents.html`.

<ul id="continents">
    <li id="au">Australia</li>
    <li id="na">NorthAmerica</li>
    <li id="sa">SouthAmerica</li>
    <li id="ea">Eurasia</li>
    <li id="af">Africa</li>
</ul>

from bs4 import BeautifulSoup
fp = open("continents.html", encoding="utf-8")

soup = BeautifulSoup(fp, 'html.parser')

sel = lambda q: print(soup.select_one(q).string)
sel("#ea")   # (1)
sel("li#ea")   # (2)
sel("ul > li#ea")   # (3)
sel("#continents #ea")   # (4)
sel("#continents > #ea")   # (5)
sel("ul#continents >li#ea")   # (6)
sel("li[id='ea']")   # (7)
sel("li:nth-of-type(4)")   # (8)

print(soup.select("li")[3].string)   # (9)
print(soup.find_all("li")[3].string)   # (10)

(1) Extract the element whose id attribute is ʻea (2) Extract the element with the

tag and the id attribute of ʻea. (3) Extract (2) by specifying it from the upper <ul> tag. (4) Extract the child element with id attribute ʻeain the hierarchy below the element with id attributecontinents (5) Extract the child element with id attribute ʻea in the hierarchy directly below the element with id attribute continents (6) Extract the elements of the <ul> tag whose id attribute is continents and the<li>tag whose id attribute is ʻeaimmediately below it. (7) Extract the element of the

tag whose id attribute is ʻea (8) Extract the element of the 4th <li> tag (9) Use select () to extract the<li>tag and get the element of that[3](3 counting from 0, that is, the 4th) (10) Use find_all () to extract the<li>tag and get the element of that[3](3 counting from 0, that is, the 4th)

Execution result

Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia Eurasia

Scraping with Beautiful Soup

Here is a summary of the functions used for scraping.

find () method, find_all () method

Extract the element by specifying an arbitrary attribute. The find () method can get one element, and the find_all () method can get multiple elements at once.

Example of use

 title = soup.find (id = "title") # Get the element whose id attribute is title
 linls = soup.find_all ("a") # Get all elements tagged

select () method, select_all () method Specify the selector by the argument and get the element. The select () method can get one element, and the select_all () method can get multiple elements. The usage example is as in sel-continents.py above.

Summary

I understand how to scrape, but I often stop understanding python grammar, so I want to keep in mind the underlying purpose of understanding python grammar.

[PYTHON] Learning record (3rd day) #CSS selector description method #BeautifulSoup scraping

content of study

How to write CSS selectors

Scraping with Beautiful Soup

Summary