If you were running a program that crawls a page written using python3 One day I got an error like this.
Access to XMLHttpRequest at 'https://target' from origin 'https://xxxxxxxxx' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
The implementation at that time is as follows.
import requests
res = requests.get("https://target") #webapi URL
The page that was the target of crawling is a page that reads and displays data with bootstrap etc. I was crawl targeting the webapi called from bootstrap during rendering.
Crawl using selenium-wire. The selenium webdriver can only handle rendered web pages, but it also has access to the results of queries during rendering. https://pypi.org/project/selenium-wire/
from seleniumwire import webdriver
driver = webdriver.Chrome()
driver.get("https://target") #URL of TOP page
for request in driver.requests:
if "xxxxx" in request.url: #Conditions for narrowing down the URLs for which you want results (webapi URL))
response_text = request.response.body.decode()
I studied in this article. https://qiita.com/att55/items/2154a8aad8bf1409db2b I see, it's definitely necessary. Because there are people who do things like themselves.
I haven't researched much, but it seems that it can't be done easily. So I gave up. In CORS, it seems that the preflight request is skipped first, and then it is actually GET or POST. https://developer.mozilla.org/ja/docs/Glossary/Preflight_request
For this article, preflight requests will be automatically issued by your browser as needed. Front-end developers usually do not need to make such requests themselves. It says
The browser does it for me = I gave up thinking that the way to fly is hidden.
Even if you do something, articles will come out using Fetch API or XMLHttpRequest, so it seems that you can only move it with javascript.
By saying javascript. It seems that you can use Fetch API with NodeJS. https://www.npmjs.com/package/node-fetch
Recommended Posts