[PYTHON] Scraping multiple pages with Beautiful Soup

In a hurry, there was a request to save data that spans multiple pages in a database, so I wrote it in a rush work. CSS selectors are deadly useful, aren't they?

Actual situation

`scl.py`


import requests, os, re, csv, bs4
import sqlite3
import lxml.html

a = 0
i = 0

url = 'https://www.〜'

while a < 55:
    a += 1
    
    res = requests.get(url)
    res.raise_for_status()
    soup = bs4.BeautifulSoup(res.text, 'lxml')


    for u in soup.select('.plan-module > .plan-link.plan-image-container'):
        urls = 'https://www.〜' + u.attrs['href']

        #print (urls)

        con = sqlite3.connect('url.db')
        c = con.cursor()
        c.execute('''CREATE TABLE IF NOT EXISTS urldata(urls unique)''')
        c.execute('INSERT INTO urldata VALUES (?)',[urls])
        con.commit()
        con.close()

    i += 1
    url = 'https://www.〜?=' + str(i)


print ('success')

However, it turned out that pagination is a dynamic element and it is useless without using Selenium.

Recommended Posts

Scraping multiple pages with Beautiful Soup

Scraping pages with pagination with Beautiful Soup

Scraping with Beautiful Soup

Table scraping with Beautiful Soup

Try scraping with Python + Beautiful Soup

Scraping with Python and Beautiful Soup

Scraping with Beautiful Soup in 10 minutes

Website scraping with Python's Beautiful Soup

Sort anime faces by scraping anime character pages with Beautiful Soup and Selenium

Crawl practice with Beautiful Soup

Beautiful Soup

[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup

[Python] Scraping a table using Beautiful Soup

Remove unwanted HTML tags with Beautiful Soup

Scraping with selenium

Scraping with Python

Scraping with Python

Beautiful Soup memo

Beautiful soup spills

Scraping with Selenium

Write a basic headless web scraping "bot" in Python with Beautiful Soup 4

Successful scraping with Selenium

Multiple selections with Jupyter

Scraping with Python (preparation)

Try scraping with Python.

Scraping with Python + PhantomJS

My Beautiful Soup (Python)

Scraping with scrapy shell

I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.

Scraping with Selenium [Python]

Note that I dealt with HTML in Beautiful Soup

Scraping with Python + PyQuery

[Python] Delete by specifying a tag with Beautiful Soup

Scraping RSS with Python

Scraping Google News search results in Python (2) Use Beautiful Soup

I tried scraping with Python

Automatically download images with scraping

Web scraping with python + JupyterLab

Scraping with selenium in Python

Scraping with Selenium + Python Part 1

Scraping with chromedriver in python

Multiple regression analysis with Keras

Festive scraping with Python, scrapy

Save images with web scraping

Scraping with Selenium in Python

Manipulate multiple proxies with Squid

Easy web scraping with Scrapy

Scraping with Tor in Python

Scraping weather forecast with python

scraping the Nikkei 225 with playwright-python

Scraping with Selenium + Python Part 2

Get the link destination URL by specifying a text sentence with Python scraping (Beautiful Soup) + XPath

I tried scraping with python

Web scraping beginner with python

Animate multiple graphs with matplotlib

Control multiple robots with jupyter-lab

I-town page scraping with selenium