-
Notifications
You must be signed in to change notification settings - Fork 131
Open
Description
Hello,
Thanks for this package, it sounds really useful and fit my needs.
I'm trying to use it to improve the Mkdocs RSS plugin when it comes to retrieve a remote image length as expected by the enclosure tag. For now, a HEAD request is tried to read the value from response content-length header. If it fails,
See related code: https://github.com/Guts/mkdocs-rss-plugin/blob/68c62e5b579b408dbc9999b251bb7c13c562cee8/mkdocs_rss_plugin/util.py#L620-L668
Here comes my quick & dirty dev script to test it quickly:
import http.client
import logging
from pathlib import Path
import requests
from cachecontrol import CacheControl
from cachecontrol.caches.file_cache import FileCache
http.client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
req_log = logging.getLogger("requests.packages.urllib3")
req_log.setLevel(logging.DEBUG)
req_log.propagate = True
sess = CacheControl(
requests.Session(), cache=FileCache(".web_cache"), cacheable_methods=("HEAD", "GET")
)
# head requests
resp_img = sess.head(
"https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
)
# try again, cache hit expected
resp_img = sess.head(
"https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
)
print(list(Path(".web_cache").iterdir()))
But reading the log, I can see that the cache is not used, nor even stored:
DEBUG:cachecontrol.controller:Looking up "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png" in the cache
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): cdn.geotribu.fr:443
send: b'HEAD /img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png HTTP/1.1\r\nHost: cdn.geotribu.fr\r\nUser-Agent: python-requests/2.32.3\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 25 Jun 2024 10:37:50 GMT
header: Server: Apache/2.4.59 (Ubuntu)
header: Strict-Transport-Security: max-age=15552001; includeSubDomains;
header: Vary: User-Agent,Accept-Encoding
header: Upgrade: h2,h2c
header: Connection: Upgrade, Keep-Alive
header: Last-Modified: Mon, 15 May 2023 11:16:04 GMT
header: ETag: "1784e-5fbb994a06b57"
header: Accept-Ranges: bytes
header: Content-Length: 96334
header: Cache-Control: max-age=31536000, public
header: Keep-Alive: timeout=5, max=100
header: Content-Type: image/png
DEBUG:urllib3.connectionpool:https://cdn.geotribu.fr:443 "HEAD /img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png HTTP/11" 200 0
DEBUG:cachecontrol.controller:Looking up "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png" in the cache
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:cachecontrol.controller:No cache entry available
send: b'HEAD /img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png HTTP/1.1\r\nHost: cdn.geotribu.fr\r\nUser-Agent: python-requests/2.32.3\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 25 Jun 2024 10:37:50 GMT
header: Server: Apache/2.4.59 (Ubuntu)
header: Strict-Transport-Security: max-age=15552001; includeSubDomains;
header: Vary: User-Agent,Accept-Encoding
header: Last-Modified: Mon, 15 May 2023 11:16:04 GMT
header: ETag: "1784e-5fbb994a06b57"
header: Accept-Ranges: bytes
header: Content-Length: 96334
header: Cache-Control: max-age=31536000, public
header: Keep-Alive: timeout=5, max=99
header: Connection: Keep-Alive
header: Content-Type: image/png
DEBUG:urllib3.connectionpool:https://cdn.geotribu.fr:443 "HEAD /img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png HTTP/11" 200 0
Traceback (most recent call last):
File "/home/jmo/Git/Perso/mkdocs-rss-plugin/tests/dev/dev_cached_http.py", line 45, in <module>
print(list(Path(".web_cache").iterdir()))
File "/usr/lib/python3.10/pathlib.py", line 1017, in iterdir
for name in self._accessor.listdir(self):
FileNotFoundError: [Errno 2] No such file or directory: '.web_cache'
BUT if I make a GET request to the same resource before the HEAD, the cache is stored AND even read for the HEAD:
import http.client
import logging
from pathlib import Path
import requests
from cachecontrol import CacheControl
from cachecontrol.caches.file_cache import FileCache
http.client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
req_log = logging.getLogger("requests.packages.urllib3")
req_log.setLevel(logging.DEBUG)
req_log.propagate = True
sess = CacheControl(
requests.Session(), cache=FileCache(".web_cache"), cacheable_methods=("HEAD", "GET")
)
# get requests
resp = sess.get("https://geotribu.fr")
resp_img = sess.get(
"https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
)
# try again, cache hit expected
resp = sess.get("https://geotribu.fr")
resp_img = sess.get(
"https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
)
# head requests
resp_img = sess.head(
"https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
)
# try again, cache hit expected
resp_img = sess.head(
"https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
)
print(list(Path(".web_cache").iterdir()))
Logs:
DEBUG:cachecontrol.controller:Looking up "https://geotribu.fr/" in the cache
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): geotribu.fr:443
send: b'GET / HTTP/1.1\r\nHost: geotribu.fr\r\nUser-Agent: python-requests/2.32.3\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Connection: keep-alive
header: Content-Length: 36931
header: Server: GitHub.com
header: Content-Type: text/html; charset=utf-8
header: Last-Modified: Mon, 24 Jun 2024 17:15:31 GMT
header: Access-Control-Allow-Origin: *
header: ETag: W/"6679a9b3-1c64a"
header: expires: Tue, 25 Jun 2024 05:36:20 GMT
header: Cache-Control: max-age=600
header: Content-Encoding: gzip
header: x-proxy-cache: MISS
header: X-GitHub-Request-Id: 9410:182824:28A2A85:2986177:667A54FB
header: Accept-Ranges: bytes
header: Age: 311
header: Date: Tue, 25 Jun 2024 10:42:09 GMT
header: Via: 1.1 varnish
header: X-Served-By: cache-par-lfpg1960074-PAR
header: X-Cache: HIT
header: X-Cache-Hits: 0
header: X-Timer: S1719312129.348531,VS0,VE1
header: Vary: Accept-Encoding
header: X-Fastly-Request-ID: 4197c8a81421f5a2aed12f7abecd529eb36a01bb
DEBUG:urllib3.connectionpool:https://geotribu.fr:443 "GET / HTTP/11" 200 36931
DEBUG:cachecontrol.controller:Updating cache with response from "https://geotribu.fr/"
DEBUG:cachecontrol.controller:etag object cached for 1209600 seconds
DEBUG:cachecontrol.controller:Caching due to etag
DEBUG:filelock:Attempting to acquire lock 135385395530768 on .web_cache/a/e/3/4/8/ae3480509cd61ad7727667c13727f5909a71b094423dffa3730b3ab8.lock
DEBUG:filelock:Lock 135385395530768 acquired on .web_cache/a/e/3/4/8/ae3480509cd61ad7727667c13727f5909a71b094423dffa3730b3ab8.lock
DEBUG:filelock:Attempting to release lock 135385395530768 on .web_cache/a/e/3/4/8/ae3480509cd61ad7727667c13727f5909a71b094423dffa3730b3ab8.lock
DEBUG:filelock:Lock 135385395530768 released on .web_cache/a/e/3/4/8/ae3480509cd61ad7727667c13727f5909a71b094423dffa3730b3ab8.lock
DEBUG:cachecontrol.controller:Looking up "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png" in the cache
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:cachecontrol.controller:No cache entry available
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): cdn.geotribu.fr:443
send: b'GET /img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png HTTP/1.1\r\nHost: cdn.geotribu.fr\r\nUser-Agent: python-requests/2.32.3\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 25 Jun 2024 10:42:09 GMT
header: Server: Apache/2.4.59 (Ubuntu)
header: Strict-Transport-Security: max-age=15552001; includeSubDomains;
header: Vary: User-Agent,Accept-Encoding
header: Upgrade: h2,h2c
header: Connection: Upgrade, Keep-Alive
header: Last-Modified: Mon, 15 May 2023 11:16:04 GMT
header: ETag: "1784e-5fbb994a06b57"
header: Accept-Ranges: bytes
header: Content-Length: 96334
header: Cache-Control: max-age=31536000, public
header: Keep-Alive: timeout=5, max=100
header: Content-Type: image/png
DEBUG:urllib3.connectionpool:https://cdn.geotribu.fr:443 "GET /img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png HTTP/11" 200 96334
DEBUG:cachecontrol.controller:Updating cache with response from "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png"
DEBUG:cachecontrol.controller:etag object cached for 1209600 seconds
DEBUG:cachecontrol.controller:Caching due to etag
DEBUG:filelock:Attempting to acquire lock 135385395533696 on .web_cache/8/d/2/2/c/8d22c12b7f7b5166ec3853b07d4b8dc0cb5c13a8eba0e19e34e4fb70.lock
DEBUG:filelock:Lock 135385395533696 acquired on .web_cache/8/d/2/2/c/8d22c12b7f7b5166ec3853b07d4b8dc0cb5c13a8eba0e19e34e4fb70.lock
DEBUG:filelock:Attempting to release lock 135385395533696 on .web_cache/8/d/2/2/c/8d22c12b7f7b5166ec3853b07d4b8dc0cb5c13a8eba0e19e34e4fb70.lock
DEBUG:filelock:Lock 135385395533696 released on .web_cache/8/d/2/2/c/8d22c12b7f7b5166ec3853b07d4b8dc0cb5c13a8eba0e19e34e4fb70.lock
DEBUG:cachecontrol.controller:Looking up "https://geotribu.fr/" in the cache
DEBUG:cachecontrol.controller:Current age based on date: 0
DEBUG:cachecontrol.controller:Freshness lifetime from max-age: 600
DEBUG:cachecontrol.controller:The response is "fresh", returning cached response
DEBUG:cachecontrol.controller:600 > 0
DEBUG:cachecontrol.controller:Looking up "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png" in the cache
DEBUG:cachecontrol.controller:Current age based on date: 0
DEBUG:cachecontrol.controller:Freshness lifetime from max-age: 31536000
DEBUG:cachecontrol.controller:The response is "fresh", returning cached response
DEBUG:cachecontrol.controller:31536000 > 0
DEBUG:cachecontrol.controller:Looking up "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png" in the cache
DEBUG:cachecontrol.controller:Current age based on date: 0
DEBUG:cachecontrol.controller:Freshness lifetime from max-age: 31536000
DEBUG:cachecontrol.controller:The response is "fresh", returning cached response
DEBUG:cachecontrol.controller:31536000 > 0
DEBUG:cachecontrol.controller:Looking up "https://cdn.geotribu.fr/img/articles-blog-rdp/capture-ecran/kevish_Air-Traffic.png" in the cache
DEBUG:cachecontrol.controller:Current age based on date: 0
DEBUG:cachecontrol.controller:Freshness lifetime from max-age: 31536000
DEBUG:cachecontrol.controller:The response is "fresh", returning cached response
DEBUG:cachecontrol.controller:31536000 > 0
[PosixPath('.web_cache/8'), PosixPath('.web_cache/a')]
Possibly related issue: #216
Metadata
Metadata
Assignees
Labels
No labels