[PYTHON] How to scrape websites created with SPA

I used to use the requests module as a method of scraping with Python, but this can be used for sites that return HTML generated on the server side, but since I can only get the response before executing JavaScript, the client It couldn't be used on a site created by SPA that executes JavaScript on the side and generates HTML at hand.

requests-html module

You need to use requests-html to scrape sites created with SPA.

Installation

pip install requests-html

How to use

main.py


# -*- coding: utf-8 -*-
import requests
from requests_html import HTMLSession

def main_render_javascript_page():
    url = 'https://hogehoge'
    session = HTMLSession()
    r = session.get(url)
    r.html.render()
    title =  r.html.find('body', first=True).text
    print(title)
        
def main_normal_page():
    url = 'https://hogehoge'
    r = requests.get(url)
    print(r.text)

if __name__ == '__main__':
    main_normal_page()
    main_render_javascript_page()

official

https://requests.readthedocs.io/projects/requests-html/en/latest/

Reference site

https://dev.classmethod.jp/articles/python-asyncio/ https://blog.ikedaosushi.com/entry/2019/09/15/162445

Recommended Posts

How to scrape websites created with SPA
Scraping 2 How to scrape
How to scrape horse racing data with BeautifulSoup
How to scrape image data from flickr with python
How to update with SQLAlchemy?
How to cast with Theano
How to Alter with SQLAlchemy?
How to separate strings with','
How to RDP with Fedora31
How to Delete with SQLAlchemy?
How to scrape at speed per second with Python Selenium
Python: How to use async with
How to use virtualenv with PowerShell
How to deal with imbalanced data
How to install python-pip with ubuntu20.04LTS
How to deal with imbalanced data
How to get started with Scrapy
How to get started with Python
How to deal with DistributionNotFound errors
How to get started with Django
How to use FTP with Python
How to calculate date with python
How to install mysql-connector with pip3
How to INNER JOIN with SQLAlchemy
How to install Anaconda with pyenv
How to authenticate with Django Part 2
How to authenticate with Django Part 3
How to batch start a python program created with Jupyter notebook
How to do arithmetic with Django template
How to title multiple figures with matplotlib
How to get parent id with sqlalchemy
How to add a package with PyCharm
How to use OpenVPN with Ubuntu 18.04.3 LTS
How to use Cmder with PyCharm (Windows)
How to prevent package updates with apt
How to work with BigQuery in Python
How to use Ass / Alembic with HtoA
How to deal with enum compatibility errors
How to use Japanese with NLTK plot
How to do portmanteau test with python
How to search Google Drive with Google Colaboratory
How to display python Japanese with lolipop
How to download youtube videos with youtube-dl
How to use jupyter notebook with ABCI
How to power off Linux with Ultra96-V2
"How to pass PATH" to learn with homebrew
How to use CUT command (with sample)
How to enter Japanese with Python curses
[Python] How to deal with module errors
How to install zsh (with .zshrc customization)
How to read problem data with paiza
How to use SQLAlchemy / Connect with aiomysql
How to get started with laravel (Linux)
How to group volumes together with LVM
How to install python3 with docker centos
How to use JDBC driver with Redash
How to selectively delete past tweets with Tweepy
How to upload with Heroku, Flask, Python, Git (4)
How to deal with memory leaks in matplotlib.pyplot
How to create sample CSV data with hypothesis
How to read a CSV file with Python 2/3