Skip to content

feat(reborn): add prompt write safety policy#3167

Merged
serrrfirat merged 10 commits into
nearai:reborn-integrationfrom
serrrfirat:issue-3019-prompt-write-safety
May 2, 2026
Merged

feat(reborn): add prompt write safety policy#3167
serrrfirat merged 10 commits into
nearai:reborn-integrationfrom
serrrfirat:issue-3019-prompt-write-safety

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat commented May 1, 2026

Summary

  • Add a host-composed prompt-write safety policy hook and sanitized reason/decision types for Reborn memory writes.
  • Add a versioned protected prompt-path registry seeded with the legacy prompt-injected file list, with policy-level extension support.
  • Enforce protected write/append checks before persistence/indexing in memory repository and filesystem adapter paths, including final-content append scanning and explicit one-shot empty-clear allowances.
  • Add redacted prompt-write safety event sink plumbing, with configured sink failures failing closed before protected-write persistence.
  • Add caller-level memory contract coverage for rejection, fail-closed behavior, custom registries, non-protected writes, binary writes, previous-content hashes, atomic append retry behavior, event redaction, and one-shot allowances.

Refs #3019.

Scope note

This PR implements the Reborn memory substrate for #3019: policy/registry/event types plus repository and filesystem write/append enforcement. It intentionally does not complete final product-service wiring for future/outer patch, import, seed, profile, admin system-prompt, or capability-specific mutation surfaces. Those callers must route their final resolved content through this hook (or present an explicit one-shot allowance) before #3019 is closed as a full cutover blocker.

Test Plan

  • cargo test -p ironclaw_memory --features libsql
  • cargo clippy -p ironclaw_memory --features libsql --all-targets -- -D warnings
  • cargo test -p ironclaw_architecture
  • cargo fmt --check
  • git diff --check

Notes

  • cargo test -p ironclaw_safety -p ironclaw_memory -p ironclaw_host_runtime currently fails on origin/reborn-integration in unrelated host-runtime test host_http_egress_consumes_staged_policy_when_request_validation_fails; I reproduced the same failure with this branch's changes stashed.

@github-actions github-actions Bot added scope: dependencies Dependency updates size: XL 500+ changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels May 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive prompt-write safety policy framework to protect sensitive files, such as system prompts and agent identities, from malicious injections. It implements a versioned registry for protected paths, a safety policy trait with a default sanitizer-based implementation, and integrates enforcement checks into the memory backend and filesystem adapters. Feedback identifies critical issues with the non-atomic nature of the new append_file implementations, which are susceptible to race conditions. Additionally, there are opportunities to optimize performance by avoiding redundant database reads for content hashing and to improve the architecture by centralizing policy enforcement to prevent duplicate checks and the loss of safety allowances in the filesystem adapter.

Comment on lines +1553 to +1601
async fn append_file(&self, path: &VirtualPath, bytes: &[u8]) -> Result<(), FilesystemError> {
self.ensure_file_documents(path, FilesystemOperation::AppendFile)?;
let document_path = self.parse_file_path(path, FilesystemOperation::AppendFile)?;
let context = MemoryContext::new(document_path.scope().clone());
let mut combined = self
.backend
.read_document(&context, &document_path)
.await?
.unwrap_or_default();
let is_protected = prompt_write_protected_classification(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
&document_path,
)
.is_some();
let previous_hash = if is_protected {
std::str::from_utf8(&combined).ok().map(content_sha256)
} else {
None
};
combined.extend_from_slice(bytes);
if is_protected {
let content = std::str::from_utf8(&combined).map_err(|_| {
memory_error(
path.clone(),
FilesystemOperation::AppendFile,
"memory document content must be UTF-8",
)
})?;
enforce_prompt_write_safety(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
PromptWriteSafetyCheck {
scope: context.scope(),
path: &document_path,
operation: PromptWriteOperation::Append,
source: PromptWriteSource::MemoryFilesystemAdapter,
content,
previous_content_hash: previous_hash.as_deref(),
allowance: context.prompt_write_safety_allowance(),
filesystem_operation: FilesystemOperation::AppendFile,
},
)
.await?;
}
self.backend
.write_document(&context, &document_path, &combined)
.await
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The append_file implementation performs a non-atomic read-modify-write operation, which can lead to race conditions and data loss. To prevent this, a write lock should be held across the entire operation. Consider extending the MemoryBackend trait with an append_document method to allow backends to handle this atomically (e.g., using BEGIN IMMEDIATE in LibSql to serialize concurrent writers).

References
  1. To prevent race conditions when updating a swappable resource, hold a write lock across the entire read-modify-write operation, including both the inner resource mutation and the snapshot refresh.
  2. When saving document versions in libSQL, wrap the SELECT and INSERT operations in a transaction with BEGIN IMMEDIATE to acquire a write lock upfront. This serializes concurrent writers and prevents race conditions where multiple requests might fetch the same next version number, leading to unique constraint violations.

Comment on lines +2570 to +2620
async fn append_file(&self, path: &VirtualPath, bytes: &[u8]) -> Result<(), FilesystemError> {
let document_path = self.parse_file_path(path, FilesystemOperation::AppendFile)?;
let mut combined = self
.repository
.read_document(&document_path)
.await?
.unwrap_or_default();
let is_protected = prompt_write_protected_classification(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
&document_path,
)
.is_some();
let previous_hash = if is_protected {
std::str::from_utf8(&combined).ok().map(content_sha256)
} else {
None
};
combined.extend_from_slice(bytes);
if is_protected {
let content = std::str::from_utf8(&combined).map_err(|_| {
memory_error(
path.clone(),
FilesystemOperation::AppendFile,
"memory document content must be UTF-8",
)
})?;
enforce_prompt_write_safety(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
PromptWriteSafetyCheck {
scope: document_path.scope(),
path: &document_path,
operation: PromptWriteOperation::Append,
source: PromptWriteSource::MemoryDocumentFilesystem,
content,
previous_content_hash: previous_hash.as_deref(),
allowance: None,
filesystem_operation: FilesystemOperation::AppendFile,
},
)
.await?;
}
self.repository
.write_document(&document_path, &combined)
.await?;
if let Some(indexer) = &self.indexer {
let _ = indexer.reindex_document(&document_path).await;
}
Ok(())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the adapter implementation, this append_file implementation is not atomic. Concurrent appends will result in race conditions. To ensure reliability, the MemoryDocumentRepository trait should be extended with an atomic append_document method so that implementations can hold a write lock (e.g., using BEGIN IMMEDIATE in libSQL) across the entire read-modify-write cycle.

References
  1. To prevent race conditions when updating a swappable resource, hold a write lock across the entire read-modify-write operation, including both the inner resource mutation and the snapshot refresh.
  2. When saving document versions in libSQL, wrap the SELECT and INSERT operations in a transaction with BEGIN IMMEDIATE to acquire a write lock upfront. This serializes concurrent writers and prevents race conditions where multiple requests might fetch the same next version number, leading to unique constraint violations.

Comment on lines +1776 to +1789
let previous_hash = if prompt_write_protected_classification(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
path,
)
.is_some()
{
self.repository
.read_document(path)
.await?
.and_then(|bytes| std::str::from_utf8(&bytes).ok().map(content_sha256))
} else {
None
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This explicit read_document call to obtain a content hash for the safety policy is performed for every write to a protected path, even if the configured policy does not utilize the hash. This results in redundant database roundtrips. While performance optimizations should be gated on profiling per repository guidelines, this architectural redundancy should be addressed by making the read conditional based on policy requirements or deferring it to the repository layer.

References
  1. Performance optimizations, such as adding a caching layer, should be gated on profiling, especially when existing mechanisms (e.g., ETags, short-circuiting for specific paths) already reduce overhead for common cases.

Comment on lines +1513 to +1547
let is_protected = prompt_write_protected_classification(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
&document_path,
)
.is_some();
if is_protected {
let content = std::str::from_utf8(bytes).map_err(|_| {
memory_error(
path.clone(),
FilesystemOperation::WriteFile,
"memory document content must be UTF-8",
)
})?;
let previous_hash = self
.backend
.read_document(&context, &document_path)
.await?
.and_then(|bytes| std::str::from_utf8(&bytes).ok().map(content_sha256));
enforce_prompt_write_safety(
self.prompt_safety_policy.as_ref(),
&self.prompt_protected_path_registry,
PromptWriteSafetyCheck {
scope: context.scope(),
path: &document_path,
operation: PromptWriteOperation::Write,
source: PromptWriteSource::MemoryFilesystemAdapter,
content,
previous_content_hash: previous_hash.as_deref(),
allowance: context.prompt_write_safety_allowance(),
filesystem_operation: FilesystemOperation::WriteFile,
},
)
.await?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Safety policy enforcement is implemented in both MemoryBackendFilesystemAdapter and RepositoryMemoryBackend. When these layers are stacked, the same safety checks are executed twice on the same content. Since RepositoryMemoryBackend acts as the primary persistence bridge, it may be more appropriate to centralize enforcement there and have the adapter rely on the backend's enforcement, especially since the backend will see the final combined content in the case of appends.

Comment thread crates/ironclaw_memory/src/lib.rs Outdated
@@ -898,11 +1510,96 @@ impl RootFilesystem for MemoryBackendFilesystemAdapter {
self.ensure_file_documents(path, FilesystemOperation::WriteFile)?;
let document_path = self.parse_file_path(path, FilesystemOperation::WriteFile)?;
let context = MemoryContext::new(document_path.scope().clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The adapter creates a fresh MemoryContext for every operation, which effectively strips any safety allowances (like empty_prompt_file_clear) that might have been intended by the caller. Since the RootFilesystem trait does not currently support passing context or allowances, this makes it impossible to perform policy-restricted operations (such as clearing a protected file) through the filesystem interface, even for privileged host-level callers.

@serrrfirat serrrfirat linked an issue May 2, 2026 that may be closed by this pull request
14 tasks
@serrrfirat serrrfirat merged commit 7311716 into nearai:reborn-integration May 2, 2026
13 checks passed
This was referenced May 7, 2026
@ironclaw-ci ironclaw-ci Bot mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: dependencies Dependency updates size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reborn cutover blocker: add PromptWriteSafetyPolicy hook

1 participant