Pharmaceutical company researchers summarized web scraping using Python

Introduction

Here, I will explain about web scraping using Python.

BeautifulSoup

Suppose you want to crawl and scrape a web page that displays the following HTML file.

<ul class="list-group">
  <li class="list-group-item"><a href="">Element 1</a></li>
  <li class="list-group-item"><a href="">Element 2</a></li>
  <li class="list-group-item"><a href="">Element 3</a></li>
</ul>

The Python script looks like this:

import requests
from bs4 import BeautifulSoup


url =URL to get HTML
response = requests.get(url)
response.encoding = response.apparent_encoding

bs = BeautifulSoup(response.text, 'html.parser')

ul = bs.select('ul.list-group')

for li in ul[0].select('li.list-group-item'):
    a_tags = li.select('a')
    a_tag = a_tags[0]
    item_name = a_tag.text.strip()

Scrapy

Suppose you want to crawl and scrape an HTML file similar to the one using Beautiful Soup above.

<ul class="list-group">
  <li class="list-group-item"><a href="">Element 1</a></li>
  <li class="list-group-item"><a href="">Element 2</a></li>
  <li class="list-group-item"><a href="">Element 3</a></li>
</ul>
import scrapy


class SampleSpider(scrapy.Spider):
    name = 'sample'
    allowd_domains = [domain]
    start_urls = [
Target URL
    ]

    def parse_list(self, response):
        ul = response.css('ul.list-group')[0]
        for li in ul.css('li.list-group-item'):
            item_url = li.css('a::attr(href)').extract_first()

            yield scrapy.Request(item_url, callback=parse_detail)

    def parse_detail(self, response):
        item_name = response.css('h1.item-name::text').extract_first()
        return item_name

Summary

Here, I explained how to scrape the web using Beautiful Soup and Scrapy.

Reference materials / links

What is the programming language Python? Can it be used for AI and machine learning?

Recommended Posts

Pharmaceutical company researchers summarized web scraping using Python
Pharmaceutical company researchers summarized database operations using Python
Pharmaceutical company researchers summarized Python control statements
Pharmaceutical company researchers summarized Python unit tests
Pharmaceutical company researchers summarized classes in Python
Pharmaceutical company researchers summarized functions in Python
Pharmaceutical company researchers summarized Python exception handling
Pharmaceutical company researchers summarized Python coding standards
Pharmaceutical company researchers summarized variables in Python
Pharmaceutical company researchers summarized regular expressions in Python
Pharmaceutical company researchers summarized file scanning in Python
Pharmaceutical company researchers summarized SciPy
Pharmaceutical company researchers summarized RDKit
Pharmaceutical company researchers summarized scikit-learn
Pharmaceutical company researchers summarized Pandas
Web scraping using Selenium (Python)
Pharmaceutical company researchers summarized NumPy
Pharmaceutical company researchers summarized Matplotlib
Pharmaceutical company researchers summarized Seaborn
Pharmaceutical company researchers summarized Python's comprehensions
Pharmaceutical company researchers have summarized the operators used in Python
Scraping using Python
Pharmaceutical company researchers summarized Python's data structures
[Beginner] Python web scraping using Google Colaboratory
I tried web scraping using python and selenium
How to install Python for pharmaceutical company researchers
Python web scraping selenium
Scraping using Python 3.5 async / await
Web scraping with python + JupyterLab
Web scraping notes in python3
Scraping using Python 3.5 Async syntax
Web scraping using AWS lambda
Web scraping beginner with python
Web scraping with Python ① (Scraping prior knowledge)
Web scraping with Python First step
I tried web scraping with python.
Beginners use Python for web scraping (1)
Beginners use Python for web scraping (4) ―― 1
A pharmaceutical company researcher summarized the basic description rules of Python
WEB scraping with Python (for personal notes)
Getting Started with Python Web Scraping Practice
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
[Scraping] Python scraping
Horse Racing Site Web Scraping with Python
Scraping a website using JavaScript in Python
Getting Started with Python Web Scraping Practice
[Python] Scraping a table using Beautiful Soup
Practice web scraping with Python and Selenium
Easy web scraping with Python and Ruby
web scraping
[For beginners] Try web scraping with Python
AWS-Perform web scraping regularly with Lambda + Python + Cron
Procedure to use TeamGant's WEB API (using python)
Try using the Python web framework Tornado Part 1
Create a web map using Python and GDAL
[Python] Flow from web scraping to data analysis
Try using the Python web framework Tornado Part 2
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
Python scraping notes