[PYTHON] Snippets (scraping) registered in Google Colaboratory

Beautifulsoup4

base

from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup

url = "http://example.jp"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
}

r = requests.get(url, headers=headers)
r.raise_for_status()

soup = BeautifulSoup(r.content, "html.parser")


urljoin(url, "index.html")

session

with requests.Session() as s:

    r = s.get("http://example.jp", headers = headers)
    r.raise_for_status()

    soup = BeautifulSoup(r.content, "html.parser")

Pandas

import pandas as pd

df = pd.read_html("http://example.jp", header=0, index_col=0)

Selenium

!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.chrome.options import Options

import time

options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome("chromedriver", options=options)
driver.implicitly_wait(10)

#Main window
parent_window = driver.current_window_handle

driver.get("http://example.jp")

#URL display
print(driver.current_url)

time.sleep(3)

#click
driver.find_element_by_link_text("XXXXX").click()

#Window switching
driver.switch_to.window(driver.window_handles[-1])

Recommended Posts

Snippets (scraping) registered in Google Colaboratory
Snippets registered in Google Colaboratory (PDF text conversion)
Google colaboratory
[Beginner] Python web scraping using Google Colaboratory
Use cartopy without bugs in Google Colaboratory
Cheat sheet when scraping with Google Colaboratory (Colab)
Google Colaboratory setup summary
Is it Google Colaboratory?
Scraping google search (image)
How to load files in Google Drive with Google Colaboratory
How to use Spacy Japanese model in Google Colaboratory
I can't use the darknet command in Google Colaboratory!
Code snippets often used when processing videos with Google Colaboratory
Scraping Google News search results in Python (2) Use Beautiful Soup
Scraping with selenium in Python
[Python] Scraping in AWS Lambda
Web scraping notes in python3
Scraping with chromedriver in python
Use music21 on Google Colaboratory
Try StyleGAN on Google Colaboratory
Scraping immediately from google images!
Study Python with Google Colaboratory
Scraping with Selenium in Python
Trade-offs in web scraping & crawling
Scraping with Tor in Python
Try OpenCV with Google Colaboratory
Tool organization: Google Colaboratory (updated 2020.2.24)
Pandas 100 knocks on Google Colaboratory
How to use Google Colaboratory
How to display formulas in latex when using sympy (> = 1.4) in Google Colaboratory
Scraping the schedule of Hinatazaka46 and reflecting it in Google Calendar