Use HTTP cache in Python

When creating a crawler, you should keep iterative execution in mind. Therefore, I am concerned about the cache of the site at the crawler destination.

Even if you execute it again with the cache applied, the data that can be obtained by viewing the same page is the same.

This time, I would like to summarize the HTTP cache in Python.

HTTP cache

The HTTP cache is defined in RFC7234. By adding a header related to the cache to the response, the HTTP server can add to the HTTP client and instruct the content caching policy.

HTTP header Right align
Cache-Control Detailed instructions on cache policy, such as whether content can be cached
Expires Indicates the expiration date of the content
Etag Represents a content identifier. Etag values change as the content changes
Last-Modified Represents the latest update date and time of the content
Pragma Cache-Similar to Control, but is now left for backward compatibility only
Vary Indicates that the response returned by the server changes when the value of the request header included in the value changes.

Strong cache

Once the client caches the response, it does not send a request until it expires and uses the cached response.

Weak cache

Once the client caches the response, it will send a conditional request next time, and the server will return an empty response body with a status code of 304 if there are no updates.

Use HTTP cache in Python

import requests
from cachecontrol import CacheControl 
from cachecontrol.caches import FileCache

session = requests.Session()
#cached wrapping session_Make a session.
#Cache as a file.Save in the webcache directory.
cached_session = CacheControl(session, cache=FileCache('.webcache'))

response = cached_session.get('URL') 

# response.from_You can get the response obtained from the cache with the cache attribute.
print(f'from_cache: {response.from_cache}') 
print(f'status_code: {response.status_code}')

From the second time, the cached contents will be returned.

Recommended Posts

Use HTTP cache in Python
Use config.ini in Python
Http request in python
Use dates in Python
Use Valgrind in Python
Use profiler in Python
Let's use def in python
Use let expression in Python
Use parameter store in Python
Use MongoDB ODM in Python
Use list-keyed dict in Python
Use Random Forest in Python
Use regular expressions in Python
Use Spyder in Python IDE
Use fabric as is in python (fabric3)
Write an HTTP / 2 server in Python
How to use SQLite in Python
Use rospy with virtualenv in Python3
How to use Mysql in python
Use Python in pyenv with NeoVim
How to use ChemSpider in Python
How to use PubChem in Python
Use OpenCV with Python 3 in Window
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Use print in a Python2 lambda expression
A simple HTTP client implemented in Python
Meta-analysis in Python
Unittest in python
Easily use your own functions in Python
Page cache in Python + Flask with Flask-Caching
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Easy way to use Wikipedia in Python
Plink in Python
Constant in python
Don't use \ d in Python 3 regular expressions!
Lifegame in Python.
FizzBuzz in Python
How to use __slots__ in Python class
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
Use pathlib in Maya (Python 2.7) for upcoming Python 3.7
nCr in Python.