[PYTHON] "Gazpacho", a scraping module that can be used more easily than Beautiful Soup

I'd like to introduce a Python module called gazpacho that I recently learned.

What is gazpacho

gazpacho is a "simple, fast, modern library for web scraping".

gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. (https://pypi.org/project/gazpacho/)

The number of stars is still minor at 400, so I think it's best to keep it for personal use.

merit

--You can get and analyze HTML with this one library. --When using BeautifulSoup etc., you had to get the HTML first with requests etc. —— Fewer methods to remember --Analyze with one find command --No modules depend on

How to use

First, install the module.

pip install gazpacho

I will scrape and output the title of the book from the following site featured in the tutorial.

https://scrape.world/books

from gazpacho import get, Soup


#Get HTML based on the specified URL
html = get('https://scrape.world/books')

#Create an instance for analysis
soup = Soup(html)

#Get the elements you need. List if more than one is found[Soup]Returns (Soup for singular)
#The first argument is an HTML tag
#The second argument is the specification of id and class
#Whether the third specification allows partial match
#In the example, the class is"book-"Because it is"book-early"Etc. match
books = soup.find('div', {'class': 'book-'}, partial=True)

for book in books:
    name_header = book.find('h4')
    #The text field contains the contents of the tag
    name = name_header.text
    print(name)

Summary

Personally, I use it properly as shown below.

  1. Easy scraping-> use gazpacho
  2. Difficult with gazpacho (*)-> selenium (chromedriver-library) Use BeautifulSoup to do something

The module itself of gazpacho is simple, so I'm thinking of finding time to read it.

I hope more people will read and use this article!

Recommended Posts

"Gazpacho", a scraping module that can be used more easily than Beautiful Soup
I made a module that can be glitched easily, but I can't pass arguments from entry_points
A timer (ticker) that can be used in the field (can be used anywhere)
Python standard module that can be used on the command line
Acoustic signal processing module that can be used with Python-Sounddevice ASIO [Application]
How to make a rock-paper-scissors bot that can be easily moved (commentary)
Create a web app that can be easily visualized with Plotly Dash
Acoustic signal processing module that can be used with Python-Sounddevice ASIO [Basic]
[Python] Scraping a table using Beautiful Soup
A personal memo of Pandas related operations that can be used in practice
I made a familiar function that can be used in statistics with Python
How to install a Python library that can be used by pharmaceutical companies
File types that can be used with Go
Functions that can be used in for statements
Convert images from FlyCapture SDK to a form that can be used with openCV
Python knowledge notes that can be used with AtCoder
ANTs image registration that can be used in 5 minutes
[Django] About users that can be used on template
[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup
I wrote a miscellaneous Ansible module that enables Virtualenv to be used by installing Pythonz.
[Atcoder] [C ++] I made a test automation tool that can be used during the contest
Scraping with Beautiful Soup
Goroutine (parallel control) that can be used in the field
Goroutine that can be used in the field (errgroup.Group edition)
Scripts that can be used when using bottle in Python
Implement a thread that can be paused by exploiting yield
Let's make a diagram that can be clicked with IPython
Understand the probabilities and statistics that can be used for progress management with a python program
About the matter that torch summary can be really used when building a model with Pytorch