Create a page that keeps outputting HTML endlessly and does not finish loading.
I wanted to scrape a similar page, so for that test. I wrote the scraping method in this article.
Implementations of infinite scroll pages are often done in javascript, and the source is finite (ends), so you can get the source with curl or requests.get. On the other hand, the configuration introduced this time does not finish loading the source, so the usual curl and requests.get will time out.
inf_page.py
import sys
import http.server
from http.server import SimpleHTTPRequestHandler
from http.server import BaseHTTPRequestHandler
from time import sleep
class infiniteHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.send_header('Transfer-Encodeing', 'chunked')
self.end_headers()
inc = 0
while(True):
try:
self.wfile.write(f"<p>Hello World ! {inc}</p>".encode("ascii"))
self.wfile.flush()
print("wrote")
sleep(2)
inc += 1
except:
break
return
server_address = ('127.0.0.1', 8000)
infiniteHandler.protocol_version = "HTTP/1.1"
httpd = http.server.HTTPServer(server_address, infiniteHandler)
sa = httpd.socket.getsockname()
print("Serving HTTP on", sa[0], "port", sa[1], "...")
httpd.serve_forever()
Browse [http: // localhost: 8000](http: // localhost: 8000) with your browser.
that's all. I wrote the scraping method in this article.
Recommended Posts