Proposal: collection-level data freshness gate for RAG retrieval pipelines #2591
SomeshZanwar
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
agent-rag-governancealready covers important retrieval-time controls: collection access policy, rate limiting, content scanning, and audit logging.One adjacent governance question is not currently represented clearly:
This proposal is about exploring a collection-level freshness / quality gate for RAG retrieval pipelines.
The goal is not to require a specific data quality provider or add mandatory behavior. The goal is to validate whether AGT should support an optional pattern for blocking retrieval when the source collection is stale, failing validation, or below an acceptable quality threshold.
Problem
A RAG pipeline can pass the existing governance checks and still retrieve from a weak source collection.
For example:
But the collection itself may still be problematic:
In that case, the retrieval may be governance-compliant at the access/content layer, but still unreliable at the data-source layer.
Proposed direction
Add an optional collection-level data freshness / quality gate to the RAG governance pattern.
Conceptually:
A possible example shape could look like:
If the collection fails the freshness / quality gate, retrieval would be blocked before chunks are fetched or passed to the LLM.
The audit record could then explain that the denial happened at the collection freshness layer, not at the access-policy or content-scanning layer.
Why this fits the RAG governance model
RAG governance does not only depend on who can retrieve.
It also depends on whether the retrieved source material is fit to use.
This is especially relevant for analytics, compliance, finance, healthcare, and operational decision-support workflows where stale or failing source data can produce misleading downstream answers.
The proposed gate would be:
Relationship to existing AGT work
This is related to the data-quality-aware governance examples already merged into AGT.
At the agent-action layer:
At the RAG retrieval layer, the equivalent pattern would be:
Same principle, different enforcement point.
Non-goals
This proposal is not asking AGT to:
Open questions
agent-rag-governanceas an optional package component?collection_freshnessdecision reason?Suggested first step
If maintainers think this direction is useful, I can start with a low-risk example or documentation PR first.
That would avoid changing core RAG behavior while showing the pattern end-to-end:
I am happy to follow whichever path maintainers prefer: docs/example first, or a small optional package component if that fits the current direction of
agent-rag-governance.Beta Was this translation helpful? Give feedback.
All reactions