HNT-1890 (4/4): wire RedisCorpusCache into providers + enable in staging#1441
Open
mmiermans wants to merge 1 commit into
Open
HNT-1890 (4/4): wire RedisCorpusCache into providers + enable in staging#1441mmiermans wants to merge 1 commit into
mmiermans wants to merge 1 commit into
Conversation
2d9be74 to
46ee57c
Compare
This PR makes the shared cache live:
- merino/curated_recommendations/__init__.py: lifecycle wiring. When
cache="redis", builds a RedisAdapter via create_redis_clients and
wraps the ScheduledSurfaceBackend / SectionsBackend with their
RedisCached* counterparts. Adds shutdown() to close the adapter.
- merino/main.py: lifespan integration + a global FastAPI exception
handler for CorpusCacheUnavailable -> 503.
- corpus_backends/{scheduled_surface,sections}_backend.py: hook the
L1 SWR cache to invoke the L2 cache. L1 SWR TTLs are cut roughly
in half (110-130s -> 50-70s) on the assumption L2 absorbs the load
this would otherwise generate.
- merino/configs/stage.toml: cache = "redis", enabling the cache in
staging. Production is enabled separately after staging soak.
- tests/unit/curated_recommendations/test_init.py: covers the new
init paths (cache="none", cache="redis", missing Redis config).
Rollout (post-merge):
1. Stage: enabled by stage.toml in this PR. Watch corpus QPS drop.
2. Prod: separate config change after stage soak (cache = "redis"
in production.toml or MERINO_..._CACHE=redis env var).
3. Add a request-volume alert at the new (lower) baseline so we
detect Redis cache failures via Apollo metrics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c323f09 to
40e7bf9
Compare
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 4/4 to implement shared caching of curated recommendations between pods. This PR makes the shared cache live in staging.
Stack: 1/4 infra → 2/4 impl → 3/4 integration tests → 4/4 (this).
References
JIRA: HNT-1890
Description
Wires
RedisCorpusCacheinto the curated-recommendations subsystem and turns it on in staging. After this PR, ~300 production pods stop hitting the Pocket Corpus GraphQL API independently — one pod fetches and the rest read from Redis (in staging first; production is a separate config change).What's wired
merino/curated_recommendations/__init__.py— whencache = "redis", builds aRedisAdapterviacreate_redis_clientsand wrapsScheduledSurfaceBackend/SectionsBackendwith theirRedisCached*counterparts. Addsshutdown()to close the adapter cleanly.merino/main.py— lifespan registration + a global FastAPI exception handler that convertsCorpusCacheUnavailableinto HTTP 503.corpus_backends/{scheduled_surface,sections}_backend.py— hooks the existing L1 SWR cache to consult the L2 cache during revalidation. L1 SWR TTLs are cut roughly in half (110–130s → 50–70s), because L2 now absorbs the load this would otherwise generate. Reviewers: this is the only behavior change for clients runningcache = "none", so confirm it's acceptable.merino/configs/stage.toml— setscache = \"redis\"in staging.tests/unit/curated_recommendations/test_init.py— coverscache=\"none\",cache=\"redis\", and missing-Redis-config paths.Implementation decisions
asyncio.Lockper cache entrycache = \"redis\"instage.tomlRollout plan (post-merge)
stage.tomlchange. Monitor corpus API QPS drop and Redis hit rate.cache = \"redis\"inproduction.tomlor override viaMERINO_CURATED_RECOMMENDATIONS__CORPUS_CACHE__CACHE=redisenv var.PR Review Checklist
[load test: (abort|skip|warn)]keywords applied (consider for the wire-up commit)┆Issue is synchronized with this Jira Task