Web scraping with python + JupyterLab

Introduction

JupyterLab is an execution environment where you can easily touch python.

Environment

git clone https://github.com/takiguchi-yu/python-jupyterLab.git
cd python-jupyterLab

Start JupyterLab

docker-compose up -d

access

http://localhost:8888

初期画面

JupyterLab finished

docker-compose down

Web scraping sample

Let's write a little web scraping. A sample that reads the URL described in the external file and outputs the result to the external file while hitting it. Webスクレイピング実装

from bs4 import BeautifulSoup
import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1'
}
print('Start processing')
#List of URLs(External file)Read
with open('./input_urls.txt', mode='r', encoding='utf-8') as f:
    for url in f:
        result = requests.get(url.rstrip('\n'), headers=headers) #Note: Remove the line feed code
        print(result.status_code)
        soup = BeautifulSoup(result.content, 'html.parser')
        a = soup.find_all('HTML tag name here', {'class': 'Class name here'})
        #a = soup.find_all('div', {'class': 'hoge-hoge'})  #Example
        b = a[0].find(text=True) #Get the text of an HTML tag
        #External file of scraping result(output.txt)Output to
        with open('./output.txt', 'a') as f:
            print(b, file=f)
print('Processing completed')

You can also use the terminal

You can freely put in your favorite library

ターミナル1 ターミナル2

Refer to the following for environment construction

https://qiita.com/hgaiji/items/edf71435d0565257f980

Recommended Posts

Web scraping with python + JupyterLab
Web scraping beginner with python
Scraping with Python
Web scraping with Python ① (Scraping prior knowledge)
Web scraping with Python First step
I tried web scraping with python.
Scraping with Python (preparation)
Try scraping with Python.
WEB scraping with Python (for personal notes)
Getting Started with Python Web Scraping Practice
Scraping with Python + PhantomJS
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Getting Started with Python Web Scraping Practice
Scraping with Selenium [Python]
Python web scraping selenium
Practice web scraping with Python and Selenium
Scraping with Python + PyQuery
Easy web scraping with Python and Ruby
Scraping RSS with Python
[For beginners] Try web scraping with Python
AWS-Perform web scraping regularly with Lambda + Python + Cron
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
I tried scraping with Python
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Web scraping notes in python3
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Save images with web scraping
Scraping with Selenium in Python
Easy web scraping with Scrapy
Scraping with Tor in Python
Web API with Python + Falcon
Web scraping using Selenium (Python)
Scraping weather forecast with python
Scraping with Selenium + Python Part 2
Web application with Python + Flask ② ③
I tried scraping with python
Streamline web search with python
Web application with Python + Flask ④
Data analysis for improving POG 1 ~ Web scraping with Python ~
Quick web scraping with Python (while supporting JavaScript loading)
[Scraping] Python scraping
Python beginners get stuck with their first web scraping
web scraping
Try scraping with Python + Beautiful Soup
Scraping with Node, Ruby and Python
Scraping with Selenium in Python (Basic)
Web scraping with BeautifulSoup4 (layered page)
Scraping with Python, Selenium and Chromedriver
Getting Started with Python Web Applications
Scraping Alexa's web rank with pyQuery
Scraping with Python and Beautiful Soup
Monitor Python web apps with Prometheus
Get web screen capture with python
Let's do image scraping with Python
Get Qiita trends with Python scraping
Beginners use Python for web scraping (1)