Skip to content

Commit 70beea9

Browse files
committed
Implement referencing.retrieval.to_cached_resource
This is a fairly simple caching decorator (which just delegates to lru_cache), but it's useful because: * Most dynamic retrievers will probably want this * It saves a small bit of referencing-specific boilerplate, letting users know to return some JSON and the rest is handled (as long as their schemas contain $schema as usual) It also allows for continuing to support use cases that *don't* want caching (by of course not using this decorator) such as when you *do* want to dynamically re-retrieve a URI because it may have changed contents. Some tweaks may still be necessary here, but it does work for the initial example. Refs: python-jsonschema/jsonschema#1065
1 parent a1cdcab commit 70beea9

File tree

8 files changed

+241
-5
lines changed

8 files changed

+241
-5
lines changed

docs/api.rst

+12-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ API Reference
1818
:undoc-members:
1919

2020

21-
.. autoclass:: referencing._core.T
21+
.. autoclass:: referencing._core.AnchorOrResource
2222

2323

2424
.. autoclass:: referencing._core.Resolver
@@ -51,6 +51,17 @@ referencing.exceptions
5151
:undoc-members:
5252

5353

54+
referencing.retrieval
55+
^^^^^^^^^^^^^^^^^^^^^
56+
57+
.. automodule:: referencing.retrieval
58+
:members:
59+
:undoc-members:
60+
61+
62+
.. autoclass:: referencing.retrieval._T
63+
64+
5465
referencing.typing
5566
^^^^^^^^^^^^^^^^^^
5667

docs/changes.rst

+5
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
Changelog
33
=========
44

5+
v0.29.0
6+
-------
7+
8+
* Add ``referencing.retrieval.to_cached_resource``, a simple caching decorator useful when writing a retrieval function turning JSON text into resources without repeatedly hitting the network, filesystem, etc.
9+
510
v0.28.6
611
-------
712

docs/intro.rst

+28
Original file line numberDiff line numberDiff line change
@@ -236,3 +236,31 @@ Here's an example of automatically retrieving external references by downloading
236236
See :kw:`schema-references` in particular.
237237

238238
`referencing` will of course therefore not do any such thing automatically, and this section generally assumes that you have personally considered the security implications for your own use case.
239+
240+
A common concern in these situations is also to *cache* the resulting resource such that repeated lookups of the same URI do not repeatedly make network calls, or hit the filesystem, etc.
241+
242+
You are of course free to use whatever caching mechanism is convenient (e.g. one specific to ``httpx`` in the above example).
243+
244+
Because of how common it is to retrieve a JSON string and construct a resource from it however, a decorator which specifically does so is also provided called `referencing.retrieval.to_cached_resource`.
245+
If you use it, note that your retrieval callable should return `str`, not a `Resource`, as the decorator will handle deserializing your response (this is mostly because otherwise, deserialized JSON is generally not hashable).
246+
247+
The above example would be written:
248+
249+
250+
.. code:: python
251+
252+
from referencing import Registry, Resource
253+
import httpx
254+
import referencing.retrieval
255+
256+
257+
@referencing.retrieval.to_cached_resource()
258+
def cached_retrieve_via_httpx(uri):
259+
return httpx.get(uri).text
260+
261+
262+
registry = Registry(retrieve=cached_retrieve_via_httpx)
263+
resolver = registry.resolver()
264+
print(resolver.lookup("https://json-schema.org/draft/2020-12/schema"))
265+
266+
and besides than that it will cache responses and not repeatedly call the retrieval function, it is otherwise functionally equivalent.

docs/spelling-wordlist.txt

+3
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,16 @@ changelog
55
deduplication
66
dereferenced
77
deserialized
8+
deserializing
89
discoverability
910
docstrings
1011
filesystem
12+
hashable
1113
implementers
1214
instantiable
1315
instantiation
1416
iterable
17+
lookups
1518
metaschemas
1619
referenceable
1720
resolvers

referencing/_core.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -505,16 +505,16 @@ def resolver_with_root(self, resource: Resource[D]) -> Resolver[D]:
505505

506506

507507
#: An anchor or resource.
508-
T = TypeVar("T", AnchorType[Any], Resource[Any])
508+
AnchorOrResource = TypeVar("AnchorOrResource", AnchorType[Any], Resource[Any])
509509

510510

511511
@frozen
512-
class Retrieved(Generic[D, T]):
512+
class Retrieved(Generic[D, AnchorOrResource]):
513513
"""
514514
A value retrieved from a `Registry`.
515515
"""
516516

517-
value: T
517+
value: AnchorOrResource
518518
registry: Registry[D]
519519

520520

referencing/retrieval.py

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
"""
2+
Helpers related to (dynamic) resource retrieval.
3+
"""
4+
from __future__ import annotations
5+
6+
from functools import lru_cache
7+
from typing import Callable, TypeVar
8+
import json
9+
10+
from referencing import Resource
11+
from referencing.typing import URI, D, Retrieve
12+
13+
#: A serialized document (e.g. a JSON string)
14+
_T = TypeVar("_T")
15+
16+
17+
def to_cached_resource(
18+
cache: Callable[[Retrieve[D]], Retrieve[D]] | None = None,
19+
loads: Callable[[_T], D] = json.loads,
20+
from_contents: Callable[[D], Resource[D]] = Resource.from_contents,
21+
) -> Callable[[Callable[[URI], _T]], Retrieve[D]]:
22+
"""
23+
Create a retriever which caches its return values from a simpler callable.
24+
25+
Takes a function which returns things like serialized JSON (strings) and
26+
returns something suitable for passing to `Registry` as a retrieve
27+
function.
28+
29+
This decorator both reduces a small bit of boilerplate for a common case
30+
(deserializing JSON from strings and creating `Resource` objects from the
31+
result) as well as makes the probable need for caching a bit easier.
32+
Retrievers which otherwise do expensive operations (like hitting the
33+
network) might otherwise be called repeatedly.
34+
35+
Examples
36+
--------
37+
38+
.. testcode::
39+
40+
from referencing import Registry
41+
import referencing.retrieval
42+
43+
44+
@referencing.retrieval.to_cached_resource()
45+
def retrieve(uri: str):
46+
print(f"Retrieved {uri}")
47+
48+
# Normally, go get some expensive JSON from the network, a file ...
49+
return '''
50+
{
51+
"$schema": "https://json-schema.org/draft/2020-12/schema",
52+
"foo": "bar"
53+
}
54+
'''
55+
56+
one = Registry(retrieve=retrieve).get_or_retrieve("urn:example:foo")
57+
print(one.value.contents["foo"])
58+
59+
# Retrieving the same URI again reuses the same value (and thus doesn't
60+
# print another retrieval message here)
61+
two = Registry(retrieve=retrieve).get_or_retrieve("urn:example:foo")
62+
print(two.value.contents["foo"])
63+
64+
.. testoutput::
65+
66+
Retrieved urn:example:foo
67+
bar
68+
bar
69+
70+
"""
71+
if cache is None:
72+
cache = lru_cache(maxsize=None)
73+
74+
def decorator(retrieve: Callable[[URI], _T]):
75+
@cache
76+
def cached_retrieve(uri: URI):
77+
response = retrieve(uri)
78+
contents = loads(response)
79+
return from_contents(contents)
80+
81+
return cached_retrieve
82+
83+
return decorator

referencing/tests/test_core.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ def test_combine_with_common_retrieve(self):
357357
two = ID_AND_CHILDREN.create_resource({"foo": "bar"})
358358
three = ID_AND_CHILDREN.create_resource({"baz": "quux"})
359359

360-
def retrieve(uri): # pragma: no cover
360+
def retrieve(uri): # pragma: no cover
361361
pass
362362

363363
first = Registry(retrieve=retrieve).with_resource(

referencing/tests/test_retrieval.py

+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
from functools import lru_cache
2+
import json
3+
4+
import pytest
5+
6+
from referencing import Registry, Resource, exceptions
7+
from referencing.jsonschema import DRAFT202012
8+
from referencing.retrieval import to_cached_resource
9+
10+
11+
class TestToCachedResource:
12+
def test_it_caches_retrieved_resources(self):
13+
contents = {"$schema": "https://json-schema.org/draft/2020-12/schema"}
14+
stack = [json.dumps(contents)]
15+
16+
@to_cached_resource()
17+
def retrieve(uri):
18+
return stack.pop()
19+
20+
registry = Registry(retrieve=retrieve)
21+
22+
expected = Resource.from_contents(contents)
23+
24+
got = registry.get_or_retrieve("urn:example:schema")
25+
assert got.value == expected
26+
27+
# And a second time we get the same value.
28+
again = registry.get_or_retrieve("urn:example:schema")
29+
assert again.value is got.value
30+
31+
def test_custom_loader(self):
32+
contents = {"$schema": "https://json-schema.org/draft/2020-12/schema"}
33+
stack = [json.dumps(contents)[::-1]]
34+
35+
@to_cached_resource(loads=lambda s: json.loads(s[::-1]))
36+
def retrieve(uri):
37+
return stack.pop()
38+
39+
registry = Registry(retrieve=retrieve)
40+
41+
expected = Resource.from_contents(contents)
42+
43+
got = registry.get_or_retrieve("urn:example:schema")
44+
assert got.value == expected
45+
46+
# And a second time we get the same value.
47+
again = registry.get_or_retrieve("urn:example:schema")
48+
assert again.value is got.value
49+
50+
def test_custom_from_contents(self):
51+
contents = {}
52+
stack = [json.dumps(contents)]
53+
54+
@to_cached_resource(from_contents=DRAFT202012.create_resource)
55+
def retrieve(uri):
56+
return stack.pop()
57+
58+
registry = Registry(retrieve=retrieve)
59+
60+
expected = DRAFT202012.create_resource(contents)
61+
62+
got = registry.get_or_retrieve("urn:example:schema")
63+
assert got.value == expected
64+
65+
# And a second time we get the same value.
66+
again = registry.get_or_retrieve("urn:example:schema")
67+
assert again.value is got.value
68+
69+
def test_custom_cache(self):
70+
schema = {"$schema": "https://json-schema.org/draft/2020-12/schema"}
71+
mapping = {
72+
"urn:example:1": dict(schema, foo=1),
73+
"urn:example:2": dict(schema, foo=2),
74+
"urn:example:3": dict(schema, foo=3),
75+
}
76+
77+
resources = {
78+
uri: Resource.from_contents(contents)
79+
for uri, contents in mapping.items()
80+
}
81+
82+
@to_cached_resource(cache=lru_cache(maxsize=2))
83+
def retrieve(uri):
84+
return json.dumps(mapping.pop(uri))
85+
86+
registry = Registry(retrieve=retrieve)
87+
88+
got = registry.get_or_retrieve("urn:example:1")
89+
assert got.value == resources["urn:example:1"]
90+
assert registry.get_or_retrieve("urn:example:1").value is got.value
91+
assert registry.get_or_retrieve("urn:example:1").value is got.value
92+
93+
got = registry.get_or_retrieve("urn:example:2")
94+
assert got.value == resources["urn:example:2"]
95+
assert registry.get_or_retrieve("urn:example:2").value is got.value
96+
assert registry.get_or_retrieve("urn:example:2").value is got.value
97+
98+
# This still succeeds, but evicts the first URI
99+
got = registry.get_or_retrieve("urn:example:3")
100+
assert got.value == resources["urn:example:3"]
101+
assert registry.get_or_retrieve("urn:example:3").value is got.value
102+
assert registry.get_or_retrieve("urn:example:3").value is got.value
103+
104+
# And now this fails (as we popped the value out of `mapping`)
105+
with pytest.raises(exceptions.Unretrievable):
106+
registry.get_or_retrieve("urn:example:1")

0 commit comments

Comments
 (0)