[Python] Code that can be written with brain death at the beginning when scraping as a beginner

Every time you scrape

`test.py`


from bs4 import BeautifulSoup

Since it is troublesome to write like this, I will create a template that is sure to use this for the time being.

`test.py`


!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
!pip install requests-html

First of all, library related. I usually use clb, so I'll put this in for the time being.

`test.py`


import pandas as pd
import datetime
from tqdm.notebook import tqdm
import requests
from bs4 import BeautifulSoup
import time
import re
from urllib.request import urlopen
import urllib.request, urllib.error
from requests_html import HTMLSession
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#Up to the point of getting html
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',options=options)
driver.implicitly_wait(10)
url="https://www.XXX.com"
driver.get(url)
html = driver.page_source.encode('utf-8')
soup = BeautifulSoup(html, "html.parser")

Yes, it's OK to copy and paste because of brain death so far. later

`test.py`


soup

With this, you can reach the point where you output html for the time being in a few seconds.

Strictly speaking, there are some libraries that I don't use, such as tqdm, but I also pack all the code that imports the libraries that I use in the set almost every time I scrape personally.

I myself copy and paste this and use it all the time.

Recommended Posts

[Python] Code that can be written with brain death at the beginning when scraping as a beginner

Article that can be a human resource who understands and masters the mechanism of API (with Python code)

[Python3] Code that can be used when you want to change the extension of an image at once

[Python] Make a graph that can be moved around with Plotly

I made a shuffle that can be reset (reverted) with Python

Understand the probabilities and statistics that can be used for progress management with a python program

About the matter that torch summary can be really used when building a model with Pytorch

[Python] A program that finds the maximum number of toys that can be purchased with your money

A memo when creating an environment that can be debugged with Lambda @ Edge for the time being

[Python3] Code that can be used when you want to cut out an image in a specific size

Since python is read as "Pichon", it can be executed with "Pichon" (it is a story)

I made a familiar function that can be used in statistics with Python

Confirmation that rkhunter can be installed

Article that can be a human resource who understands and masters the mechanism of API (with Python code)

Run the output code with tkinter, saying "A, pretending to be B" in python

[Python] Introduction to web scraping | Summary of methods that can be used with webdriver

Python knowledge notes that can be used with AtCoder

A memo that I touched the Datastore with python

Limits that can be analyzed at once with MeCab

Can be used with AtCoder! A collection of techniques for drawing short code in Python!

[Python3] Code that can be used when you want to resize images in folder units

[Python] A program to find the number of apples and oranges that can be harvested

As you may know, Python can be written like this

Get UNIXTIME at the beginning of today with a command

Solution when the image cannot be displayed with tkinter [python]

Use a macro that runs when saving python with vscode

Python code that keeps tweeting "Bals" as much as you can

The story that Python stopped working with VS Code (Windows 10)

Scripts that can be used when using bottle in Python

Precautions that must be understood when building a PYTHON environment

I investigated the pretreatment that can be done with PyCaret

Let's make a diagram that can be clicked with IPython

Run the output code on the local web server as "A, pretending to be B" in python

Here's a summary of things that might be useful when dealing with complex numbers in Python

I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory

A story that didn't work when I tried to log in with the Python requests module

・ <Slack> Write a function to notify Slack so that it can be quoted at any time (Python)

Web scraping beginner with python

Predict the number of cushions that can be received as laughter respondents with Word2Vec + Random Forest

[Python, Selenium, PhantomJS] A story when scraping a website with lazy load

I made a package that can compare morphological analyzers with Python

Make a Spinbox that can be displayed in Binary with Tkinter

From a book that programmers can learn (Python): Find the mode

A timer (ticker) that can be used in the field (can be used anywhere)

Make a currency chart that can be moved around with Plotly (2)

Make a Spinbox that can be displayed in HEX with Tkinter

Python standard module that can be used on the command line

Make a currency chart that can be moved around with Plotly (1)

The story of making a module that skips mail with python

The story of making a slackbot that outputs as gif or png when you send the processing code

[Python] A program that finds a pair that can be divided by a specified value

Basic summary of scraping with Requests that beginners can absolutely understand [Python]

The LXC Web Panel that can operate LXC with a browser was wonderful

[Python] A program that calculates the number of socks to be paired

Create a web app that can be easily visualized with Plotly Dash

Extract lines that match the conditions from a text file with python

Mathematical optimization that can be used for free work with Python + PuLP

Draw a graph that can be moved around with HoloViews and Bokeh

I made a simple timer that can be started from the terminal

The eval () function that calculates a string as an expression in python

Be careful when retrieving tweets at regular intervals with the Twitter API

Can VS Code be debugged if the path contains certain symbols? (Python)