The malicious skill problem is documented: 341 malicious skills on ClawHub (Koi Security, Jan 2026), 13.4% of scanned skills with critical issues (Snyk), VirusTotal explicitly unable to detect prompt injection, agent impersonation, or slow-burn trust accumulation.
Identity verification (#49971) addresses part of this — cryptographic proof of who published a skill. But there's a structural gap: a flagged actor creates a new identity with a clean record. The history doesn't travel. Sybil clusters (multiple fake identities controlled by one actor) are invisible to any identity-only system, regardless of how good the DID resolver is.
The missing layer is behavioral reputation — trust earned from real interaction history that can't be manufactured by creating a new identity.
How it works
Every agent-skill interaction creates a bilateral cryptographic record signed by both parties. Trust scores emerge from the interaction graph, not ratings or credentials:
- A skill that delivers consistently builds a record that follows it everywhere
- A skill that fails or misbehaves carries that permanently — abandoning the identity and re-registering starts from zero with no history
- Sybil clusters get zero trust by design: max-flow computation from seed nodes means a cluster of fake identities with no legitimate transaction graph scores zero regardless of how many new identities they create
- Slow-burn attacks become visible: every past interaction is co-signed by real counterparties, retroactive rewriting is impossible
What this looks like in practice
Before installing a skill:
trustchain_check_trust(pubkey) → { score: 0.91, interactions: 847, status: "established" }
Before executing a payment to an agent:
trustchain_check_trust(pubkey) → { score: 0.03, interactions: 2, status: "bootstrap" }
// payment blocked — insufficient history
Discover trusted agents for a capability:
trustchain_discover_peers(capability: "code-review", min_trust: 0.7)
→ ranked list of agents with verified interaction history
Working implementation
OpenClaw plugin: trustchain-js/packages/openclaw
5 MCP tools: trustchain_check_trust, trustchain_discover_peers, trustchain_record_interaction, trustchain_verify_chain, trustchain_get_identity
Protocol spec: IETF draft-viftode-trustchain-trust-00 — no blockchain, offline-capable, open source, Ed25519 identity.
Full implementation: github.com/viftode4/trustchain — Rust core (304 tests), Python SDK (311 tests), TypeScript SDK (126 tests), adapters for 12 agent frameworks (205 tests).
Relationship to identity verification
This is complementary to DID-based approaches (#49971), not competing. Identity verification answers "is this the same agent that published the skill I trust?" Behavioral reputation answers "has this agent built a track record worth trusting?" Both questions matter. Neither alone is sufficient.
The combination: DID confirms continuity of identity, behavioral reputation confirms continuity of good behavior.
Addressing open questions from #49971
- Blocking vs non-blocking: behavioral trust check should be non-blocking with configurable threshold — warn on install, hard-block on payment
- Default trust provider: the protocol should be standardized (IETF), not the provider — multiple providers using the same wire format
- Delegation chain propagation: solved — trust attenuates with each hop, scope narrows, TTL enforced, max depth bounded
- Skill publisher DID requirement: strongest when combined with behavioral history — DID + clean interaction record, not DID alone
Open to discuss how this fits into the OpenClaw plugin lifecycle — particularly whether onAgentVerify should surface both identity and behavioral signals, or whether they warrant separate hooks.
The malicious skill problem is documented: 341 malicious skills on ClawHub (Koi Security, Jan 2026), 13.4% of scanned skills with critical issues (Snyk), VirusTotal explicitly unable to detect prompt injection, agent impersonation, or slow-burn trust accumulation.
Identity verification (#49971) addresses part of this — cryptographic proof of who published a skill. But there's a structural gap: a flagged actor creates a new identity with a clean record. The history doesn't travel. Sybil clusters (multiple fake identities controlled by one actor) are invisible to any identity-only system, regardless of how good the DID resolver is.
The missing layer is behavioral reputation — trust earned from real interaction history that can't be manufactured by creating a new identity.
How it works
Every agent-skill interaction creates a bilateral cryptographic record signed by both parties. Trust scores emerge from the interaction graph, not ratings or credentials:
What this looks like in practice
Before installing a skill:
Before executing a payment to an agent:
Discover trusted agents for a capability:
Working implementation
OpenClaw plugin: trustchain-js/packages/openclaw
5 MCP tools:
trustchain_check_trust,trustchain_discover_peers,trustchain_record_interaction,trustchain_verify_chain,trustchain_get_identityProtocol spec: IETF draft-viftode-trustchain-trust-00 — no blockchain, offline-capable, open source, Ed25519 identity.
Full implementation: github.com/viftode4/trustchain — Rust core (304 tests), Python SDK (311 tests), TypeScript SDK (126 tests), adapters for 12 agent frameworks (205 tests).
Relationship to identity verification
This is complementary to DID-based approaches (#49971), not competing. Identity verification answers "is this the same agent that published the skill I trust?" Behavioral reputation answers "has this agent built a track record worth trusting?" Both questions matter. Neither alone is sufficient.
The combination: DID confirms continuity of identity, behavioral reputation confirms continuity of good behavior.
Addressing open questions from #49971
Open to discuss how this fits into the OpenClaw plugin lifecycle — particularly whether
onAgentVerifyshould surface both identity and behavioral signals, or whether they warrant separate hooks.