Https access via proxy with Python web scraping was easy with requests

I am trying web scraping with urllib and Beautifulsoup in Python3. Last time, I dealt with a communication error due to Proxy. What to do if there is no response due to Proxy settings in Python web scraping Communication by http worked well with the above method, but when it became an https site, communication was not established and an error occurred. I'm in trouble because recent websites have a lot of https. .. : disappointed_relieved: Adding the "https" item to proxies as shown below does not solve the problem. proxies={"http":"http:proxy.-----.co.jp/proxy.pac", "https":"http:proxy.-----.co.jp/proxy.pac"}

When I was looking up, I found a library called requests. I tried to use it instead of urllib and it was surprisingly easy to solve.

An example of how to use it is as follows.

requsts_sample.py


import requests

proxies = {
"http":"http://proxy.-----.co.jp/proxy.pac",
"https":"http://proxy.-----.co.jp/proxy.pac"
}
r = requests.get('https://github.com/timeline.json', proxies=proxies)
print(r.text)

When using Beautifulsourp, it seems that you should pass the content of the object obtained by requests.get. Here is a simple sample.

python::requests_beautifulsoup_sample.py


import requests
from bs4 import BeautifulSoup

proxies = {
'http':'http://proxy.-----.co.jp/proxy.pac',
'https':'http://proxy.-----.co.jp/proxy.pac'
}

def getBS(url):
    html = requests.get(url, proxies=proxies)
    bsObj = BeautifulSoup(html.content, "html.parser")
    return bsObj

htmlSource = getBS("https://en.wikipedia.org/wiki/Kevin_Bacon")

#Show links that exist on the page
for link in htmlSource.findAll("a"):
    if 'href' in link.attrs:
        print(link.attrs['href'])

The requests library was included when I installed Python 3.5.2 on Anaconda. You can check the packages installed by Anaconda Navigator. If you installed the GUI on Windows, you can find it in Windows-> All Programs-> Anaconda3-> Anaconda Navigator.

Click here for Quickstart of requests library

Recommended Posts

Https access via proxy with Python web scraping was easy with requests
Easy web scraping with Python and Ruby
Web scraping with python + JupyterLab
Easy scraping with Python (JavaScript / Proxy / Cookie compatible version)
Easy web scraping with Scrapy
Web scraping beginner with python
Web scraping with Python First step
I tried web scraping with python.
WEB scraping with Python (for personal notes)
Getting Started with Python Web Scraping Practice
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Getting Started with Python Web Scraping Practice
Easy web app with Python + Flask + Heroku
Practice web scraping with Python and Selenium
[For beginners] Try web scraping with Python
Scraping with Python
Scraping with Python
AWS-Perform web scraping regularly with Lambda + Python + Cron
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
Try scraping with Python.
Data analysis for improving POG 1 ~ Web scraping with Python ~
Quick web scraping with Python (while supporting JavaScript loading)
I was addicted to scraping with Selenium (+ Python) in 2020
Python beginners get stuck with their first web scraping
[For beginners] Web scraping with Python "Access the URL in the page to get the contents"
Scraping with Selenium [Python]
Retry with python requests
Python web scraping selenium
Scraping with Python + PyQuery
Get data from database via ODBC with Python (Access)
Scraping RSS with Python
[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup
Web crawling, web scraping, character acquisition and image saving with python
Easy deep learning web app with NNC and Python + Flask
I tried scraping with Python
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Web scraping notes in python3
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Save images with web scraping
Easy folder synchronization with Python
Scraping with Selenium in Python
Scraping with Tor in Python
Web API with Python + Falcon
Web scraping using Selenium (Python)
Scraping weather forecast with python
Easy Python compilation with NUITKA-Utilities
Easy HTTP server with Python
Easy proxy login with django-hijack
Scraping with Selenium + Python Part 2
Access Google Drive with Python
Web application with Python + Flask ② ③
I tried scraping with python
Streamline web search with python
Web application with Python + Flask ④
Try scraping with Python + Beautiful Soup