[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup

TL;DR I wanted to do web scraping with python, so I tried it with requests + BeautifulSoup as usual. However, for some reason I could only get a part of the page, and after various investigations, I found something called "requests-html", so I will introduce this.

environment

module Install requests_html with pip.

Raspberry Pi specific error

When I tried it on mac, there was no problem, but when I did pip install requests_html on Raspberry Pi, the following error occurred

ERROR: Command errored out with exit status 1:


(abridgement)
Error: Please make sure the libxml2 and libxslt development packages are installed.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output ```

 Apparently, it seems that lxml is included in requests_html, and this is an error in Raspberry Pi.
 Solved below
```sudo apt-get install libxml2-dev libxslt-dev python3-dev
pip install lxml

code

from requests_html import HTMLSession
url = "https://stopcovid19.metro.tokyo.lg.jp/cards/positive-rate"
#Session start
session = HTMLSession()
r = session.get(url)
r.html.render()

#Element acquisition
rows = r.html.find("span")
for row in rows:
    print(row.text) #The text of all span elements is displayed

Get all the specified elements in the page with r.html.find ("element name"). In this example, I got the new Tokyo Metropolitan Corona Site, but with requests + Beautiful Soup, I could only get a part of the screen. corona.png

Recommended Posts

[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Try scraping with Python + Beautiful Soup
Scraping multiple pages with Beautiful Soup
Scraping pages with pagination with Beautiful Soup
Basic summary of scraping with Requests that beginners can absolutely understand [Python]
Write a basic headless web scraping "bot" in Python with Beautiful Soup 4
Address to the bug that node.surface cannot be obtained with python3 + mecab
Get CPU information of Raspberry Pi with Python
One-liner that outputs 10000 digits of pi with Python
Measure CPU temperature of Raspberry Pi with Python
Moved Raspberry Pi remotely so that it can be LED attached with Python
Scraping with Beautiful Soup
Let's operate GPIO of Raspberry Pi with Python CGI
Web scraping with python + JupyterLab
I tried running Movidius NCS with python of Raspberry Pi3
SSD 1306 OLED can be used with Raspberry Pi + python (Note)
Web scraping beginner with python
Table scraping with Beautiful Soup
Https access via proxy with Python web scraping was easy with requests
Get US stock price from Python with Web API with Raspberry Pi
Use vl53l0x with Raspberry Pi (python)
Web scraping with Python ① (Scraping prior knowledge)
[Python] A memorandum of beautiful soup4
Scraping with Beautiful Soup in 10 minutes
Website scraping with Python's Beautiful Soup
Settings when using Python 3 requests and Beautiful Soup with crostini on Chromebook
Sort anime faces by scraping anime character pages with Beautiful Soup and Selenium
WEB scraping with Python (for personal notes)
[Python3] Understand the basics of Beautiful Soup
Getting Started with Python Web Scraping Practice
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Getting Started with Python Web Scraping Practice
Python modules with "-(hyphen)" cannot be removed
[Python] Scraping a table using Beautiful Soup
Practice web scraping with Python and Selenium
Items that cannot be imported with sklearn
Easy web scraping with Python and Ruby
[For beginners] Try web scraping with Python
Working with GPS on Raspberry Pi 3 Python
Delete files that have passed a certain period of time with Raspberry PI
Periodically notify the processing status of Raspberry Pi with python → Google Spreadsheet → LINE
[Python] How to save images on the Web at once with Beautiful Soup
"Gazpacho", a scraping module that can be used more easily than Beautiful Soup
A story that I wanted to realize the identification of parking lot fullness information using images obtained with a Web camera and Raspberry Pi and deep learning.
Discord bot with python raspberry pi zero with [Notes]
I tried L-Chika with Raspberry Pi 4 (Python edition)
Investigation when import cannot be done with python
CSV output of pulse data with Raspberry Pi (CSV output)
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
Connect to MySQL with Python on Raspberry Pi
GPS tracking with Raspberry Pi 4B + BU-353S4 (Python)
Workaround for the problem that UTF-8 Japanese mail cannot be sent with Flask-Mail (Python3)
I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.
File sharing server made with Raspberry Pi that can be used for remote work