[PYTHON] Scraping multiple pages with Beautiful Soup

In a hurry, there was a request to save data that spans multiple pages in a database, so I wrote it in a rush work. CSS selectors are deadly useful, aren't they?

Actual situation

scl.py


import requests, os, re, csv, bs4
import sqlite3
import lxml.html

a = 0
i = 0

url = 'https://www.〜'

while a < 55:
    a += 1
    
    res = requests.get(url)
    res.raise_for_status()
    soup = bs4.BeautifulSoup(res.text, 'lxml')


    for u in soup.select('.plan-module > .plan-link.plan-image-container'):
        urls = 'https://www.〜' + u.attrs['href']

        #print (urls)

        con = sqlite3.connect('url.db')
        c = con.cursor()
        c.execute('''CREATE TABLE IF NOT EXISTS urldata(urls unique)''')
        c.execute('INSERT INTO urldata VALUES (?)',[urls])
        con.commit()
        con.close()

    i += 1
    url = 'https://www.〜?=' + str(i)


print ('success')

However, it turned out that pagination is a dynamic element and it is useless without using Selenium.

Recommended Posts

Scraping multiple pages with Beautiful Soup
Scraping pages with pagination with Beautiful Soup
Scraping with Beautiful Soup
Table scraping with Beautiful Soup
Try scraping with Python + Beautiful Soup
Scraping with Python and Beautiful Soup
Scraping with Beautiful Soup in 10 minutes
Website scraping with Python's Beautiful Soup
Sort anime faces by scraping anime character pages with Beautiful Soup and Selenium
Crawl practice with Beautiful Soup
Beautiful Soup
[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup
[Python] Scraping a table using Beautiful Soup
Remove unwanted HTML tags with Beautiful Soup
Scraping with selenium
Scraping with Python
Scraping with Python
Beautiful Soup memo
Beautiful soup spills
Scraping with Selenium
Write a basic headless web scraping "bot" in Python with Beautiful Soup 4
Successful scraping with Selenium
Multiple selections with Jupyter
Scraping with Python (preparation)
Try scraping with Python.
Scraping with Python + PhantomJS
My Beautiful Soup (Python)
Scraping with scrapy shell
I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.
Scraping with Selenium [Python]
Note that I dealt with HTML in Beautiful Soup
Scraping with Python + PyQuery
[Python] Delete by specifying a tag with Beautiful Soup
Scraping RSS with Python
Scraping Google News search results in Python (2) Use Beautiful Soup
I tried scraping with Python
Automatically download images with scraping
Web scraping with python + JupyterLab
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Scraping with chromedriver in python
Multiple regression analysis with Keras
Festive scraping with Python, scrapy
Save images with web scraping
Scraping with Selenium in Python
Manipulate multiple proxies with Squid
Easy web scraping with Scrapy
Scraping with Tor in Python
Scraping weather forecast with python
scraping the Nikkei 225 with playwright-python
Scraping with Selenium + Python Part 2
Get the link destination URL by specifying a text sentence with Python scraping (Beautiful Soup) + XPath
I tried scraping with python
Web scraping beginner with python
Animate multiple graphs with matplotlib
Control multiple robots with jupyter-lab
I-town page scraping with selenium