You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(C9): move peer delegation registry control to C9.6 as 9.6.7 per Otto's review; resolve conflict
- Move from C9.5 (messaging/protocol) to C9.6 (authorization/delegation) where it belongs
- Narrow to the net-new concept: approved agent registry/allowlist as a second check
beyond authentication (identity and scope validation already covered by 9.4.1,
9.5.1, 9.6.1, 9.6.3)
- Adopt Otto's suggested wording verbatim
- Renumber from 9.5.5 to 9.6.7
- Update Appendix D entry description and ID
- Resolve Appendix D conflict (keep both 9.6.7 and 9.8.7 entries)
|**6.1.1**|**Verify that** every third-party model artifact includes a signed origin-and-integrity record identifying its source, version, and integrity checksum. | 1 |
16
16
|**6.1.2**|**Verify that** models are scanned for malicious layers or Trojan triggers using automated tools before import. | 1 |
17
17
|**6.1.3**|**Verify that** model licenses, export-control tags, and data-origin statements are recorded in an AI BOM entry. | 2 |
@@ -26,7 +26,7 @@ Assess and authenticate third-party model origins, licenses, and hidden behavior
26
26
Continuously scan AI frameworks and libraries for vulnerabilities and malicious code to keep the runtime stack secure.
|**6.4.4**|**Verify that** repository allow-lists are reviewed periodically with evidence of business justification for each entry. | 3 |
62
62
|**6.4.5**|**Verify that** policy violations trigger quarantining of artifacts and rollback of dependent pipeline runs. | 3 |
63
+
|**6.4.6**|**Verify that** cryptographic signing keys used to authenticate model publishers are pinned per source registry (e.g., Hugging Face, internal registry), and that key rotation events require explicit re-approval before updated keys are trusted. | 3 |
63
64
64
65
---
65
66
@@ -68,7 +69,7 @@ Allow artifact downloads only from cryptographically verified, organization-appr
68
69
Evaluate external datasets for poisoning, bias, and legal compliance, and monitor them throughout their lifecycle.
|**7.1.1**|**Verify that** the application validates all model outputs against a strict schema (like JSON Schema) and rejects any output that does not match. | 1 |
16
16
|**7.1.2**|**Verify that** the system uses "stop sequences" or token limits to strictly cut off generation before it can overflow buffers or executes unintended commands. | 1 |
17
17
|**7.1.3**|**Verify that** components processing model output treat it as untrusted input (e.g., using parameterized queries or safe de-serializers). | 1 |
@@ -24,7 +24,7 @@ Ensure the model outputs data in a way that helps prevent injection.
24
24
Detect when the model produces potentially inaccurate or fabricated content and prevent unreliable outputs from reaching users or downstream systems.
|**7.2.1**|**Verify that** the system assesses the reliability of generated answers using a confidence or uncertainty estimation method (e.g., confidence scoring, retrieval-based verification, or model uncertainty estimation). | 1 |
29
29
|**7.2.2**|**Verify that** the application automatically blocks answers or switches to a fallback message if the confidence score drops below a defined threshold. | 2 |
30
30
|**7.2.3**|**Verify that** hallucination events (low-confidence responses) are logged with input/output metadata for analysis. | 2 |
@@ -38,7 +38,7 @@ Detect when the model produces potentially inaccurate or fabricated content and
38
38
Technical controls to detect and scrub bad content before it is shown to the user.
|**7.3.1**|**Verify that** automated classifiers scan every response and block content that matches hate, harassment, or sexual violence categories. | 1 |
43
43
|**7.3.2**|**Verify that** the system scans every response for PII (like credit cards or emails) and automatically redacts it before display. | 1 |
44
44
|**7.3.3**|**Verify that** PII detection and redaction events are logged without including the redacted PII values themselves, to maintain an audit trail without creating secondary PII exposure. | 1 |
@@ -47,6 +47,7 @@ Technical controls to detect and scrub bad content before it is shown to the use
47
47
|**7.3.6**|**Verify that** the system requires a human approval step or re-authentication if the model generates high-risk content. | 3 |
48
48
|**7.3.7**|**Verify that** output filters detect and block responses that reproduce verbatim segments of system prompt content. | 2 |
49
49
|**7.3.8**|**Verify that** LLM client applications prevent model-generated output from triggering automatic outbound requests (e.g., auto-rendered images, iframes, or link prefetching) to attacker-controlled endpoints, for example by disabling automatic external resource loading or restricting it to explicitly allowlisted origins as appropriate. | 2 |
50
+
|**7.3.9**|**Verify that** generated outputs are analyzed for statistical steganographic covert channels (e.g., biased token-choice patterns or output distribution anomalies) that could encode hidden data across the model's valid output space, and that detections are flagged for review. | 3 |
50
51
51
52
---
52
53
@@ -55,10 +56,11 @@ Technical controls to detect and scrub bad content before it is shown to the use
55
56
Prevent the model from doing too much, too fast, or accessing things it should not.
|**7.4.1**|**Verify that** the system enforces hard limits on requests and tokens per user to prevent cost spikes and denial of service. | 1 |
60
61
|**7.4.2**|**Verify that** the model cannot execute high-impact actions (like writing files, sending emails, or executing code) without explicit user confirmation. | 1 |
61
-
|**7.4.3**|**Verify that** the application or orchestration framework explicitly configures and enforces the maximum depth of recursive calls, delegation limits, and the list of allowed external tools. | 2 |
62
+
|**7.4.3**|**Verify that** the application or orchestration framework explicitly configures and enforces a maximum depth for recursive calls to prevent unbounded recursion. | 2 |
63
+
|**7.4.4**|**Verify that** the application or orchestration framework explicitly configures and enforces a maximum number of sequential or nested sub-task delegations within a single execution chain, and that chains exceeding this limit are halted. For agent-specific tool and action authorization, see C9.6. | 2 |
62
64
63
65
---
64
66
@@ -67,10 +69,10 @@ Prevent the model from doing too much, too fast, or accessing things it should n
|**7.5.1**|**Verify that** explanations provided to the user are sanitized to remove system prompts or backend data. | 1 |
72
74
|**7.5.2**|**Verify that** the UI displays a confidence score or "reasoning summary" to the user for critical decisions. | 2 |
73
-
|**7.5.3**|**Verify that** technical evidence of the model's decision, such as model interpretability artifacts (e.g., attention maps, feature attributions), are logged.| 3 |
75
+
|**7.5.3**|**Verify that** technical evidence of the model's decision, such as model interpretability artifacts (e.g., attention maps, feature attributions), are logged.| 3 |
74
76
75
77
---
76
78
@@ -79,8 +81,8 @@ Ensure the user knows why a decision was made.
79
81
Ensure the application sends the right signals for security teams to watch.
|**7.8.1**|**Verify that** responses generated using retrieval-augmented generation (RAG) include attribution to the source documents that grounded the response, and that attributions are derived from retrieval metadata rather than generated by the model. | 1 |
|**7.8.1**|**Verify that** responses generated using retrieval-augmented generation (RAG) include attribution to the source documents that grounded the response. | 1 |
110
112
|**7.8.2**|**Verify that** each sourced claim in a RAG-grounded response can be traced to a specific retrieved chunk, and that the system detects and flags responses where claims are not supported by any retrieved content before the response is served. | 3 |
113
+
|**7.8.3**|**Verify that** RAG attributions are derived from retrieval metadata and are not generated by the model, ensuring provenance cannot be fabricated. | 1 |
0 commit comments