feat: stakeholder interview subagents (4-method post-simulation surveys)#643
Open
ChristianMoellmann wants to merge 26 commits into
Open
feat: stakeholder interview subagents (4-method post-simulation surveys)#643ChristianMoellmann wants to merge 26 commits into
ChristianMoellmann wants to merge 26 commits into
Conversation
Approved design for a four-subagent post-simulation interview system (Longitudinal, Diversity, Delphi, Scenario) over MiroFish-simulated German fisheries stakeholders, with cross-method synthesiser. Includes architecture, instrument design, data flow, API surface, error handling, validation, testing, and methodological caveats. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bite-sized TDD plan covering 21 tasks across 7 phases: setup → foundation (models, YAML loader, LLM stub, base interviewer) → 4 subagents (longitudinal, diversity Q-sort+PCA, Delphi 3-round, scenario) → storage + Zep writer → orchestrator + sim lifecycle hooks + synthesiser → Flask /api/interview blueprint → end-to-end integration test → Vue Step4b with d3 visualisations. Each task lists exact files, failing test code, implementation code, run commands, and commit message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anguage, stub mode) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… hash freezing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ting and schema retry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…A/k-means typology Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nvergence metrics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olarity matrix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd latest pointer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… per-agent + aggregate episodes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-out, isolated failures Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Manager Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…limitations section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vices to interview subsystem Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c + CSV export Add /api/interview blueprint with POST pre/post/rerun, GET status/results/synthesis/export.csv endpoints. Background tasks tracked by UUID in module-level dict. Add register_blueprints() helper to api/__init__.py and wire app factory through it. Add UPLOADS_DIR to Config with env-override default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…for all 4 subagents Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lient, i18n keys Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, Delphi, scenario polarity, synthesis Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ner→Manager on COMPLETED - Add backend/app/services/interviews/lifecycle.py with install_hooks() that registers on_ready (pre-survey) and on_completed (post-survey + synthesis) daemon-thread callbacks on a SimulationManager. - Add SimulationRunner.register_on_completed() / _fire_on_completed() so external callbacks can be notified when _monitor_simulation transitions to COMPLETED (both exit-code-0 path and simulation_end event path). - Wire both in app/__init__.py: create singleton SimulationManager, install lifecycle hooks, and register its _notify_on_completed with SimulationRunner. - Add test_lifecycle.py: verifies install_hooks registers one callable for each of ready and completed. - All 40 unit tests + 2 integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on runs (C1-C5)
Five tightly-coupled fixes that were causing the interview subsystem to silently
degrade in production:
- C1+C2: `_build_orchestrator` now resolves `graph_id` from
`SimulationManager().get_simulation(sim_id).graph_id` (the real persisted
state) instead of a `graph_id.txt` that nothing in the codebase writes.
`ZepGraphMemoryUpdater(graph_id=...)` is now called with the correct
positional argument; the bare `try/except Exception` that was swallowing the
TypeError is replaced with a narrow fallback that logs explicitly.
- C3: `SimulationManager._on_ready_hooks` / `_on_completed_hooks` are now
class-level (mirroring `SimulationRunner._on_completed_callbacks`).
Hooks registered at app startup now survive across the per-request
`SimulationManager()` instances created by the Flask API, so the T0
longitudinal auto-survey actually fires.
- C4: `ZepGraphMemoryUpdater` gains an explicit `add_text_episode(graph_id, text)`
method for synchronous text writes. `InterviewZepWriter._emit` no longer
silently falls back to a dict-shaped `add_activity` call that the real
implementation rejects (its `add_activity` requires an `AgentActivity`
dataclass).
- C5: `FileSystemPersonaProvider.agent_to_entity()` builds an
`{agent_id: zep_entity_uuid}` map from the persisted profile files; the map
is now passed to `ZepMemoryProvider` so `get_entity_with_context` is called
with real Zep UUIDs instead of `str(agent_id)`. To make this work,
`OasisProfileGenerator._save_reddit_json` and `_save_twitter_csv` now persist
`source_entity_uuid` (Reddit JSON: optional field; Twitter CSV: appended
column).
Tests: 51 unit + 2 integration pass (was 40 + 2). New tests lock in each fix:
- `test_hooks_survive_across_instances` (C3)
- `test_build_orchestrator_reads_graph_id_from_state` (C1+C2+C5)
- `test_build_orchestrator_falls_back_when_state_missing` (C1+C2)
- `test_emit_uses_add_text_episode_with_graph_id`,
`test_emit_raises_when_updater_lacks_add_text_episode`,
`test_real_updater_exposes_add_text_episode` (C4)
- `test_agent_to_entity_from_reddit_json`,
`test_agent_to_entity_empty_when_no_field`,
`test_agent_to_entity_falls_back_to_twitter_csv`,
`test_agent_to_entity_reddit_takes_precedence` (C5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds SchemaValidationFailure exception carrying both retry attempts' raw output, so audit.jsonl preserves what the model actually said when an agent's response can't be coerced into the instrument schema. Lets us diagnose persona-vs-format failures without re-running. Two new tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real LLMs (observed with anthropic/claude-haiku-4-5 on a 23-agent run)
sometimes return Likert values as JSON strings ('3' not 3). The 4 subagent
validators rejected this with isinstance(v, int), losing ~30% of agents at
N=23. Added a shared coerce_int helper in base.py that accepts ints and
numeric strings, rejects bools/floats/garbage, and is now used by:
- Longitudinal: response values 1-5
- Diversity: Q-sort placements -3..+3 and 6 Likert axes 1-7
- Delphi: R2 and R3 importance/plausibility 1-5
- Scenario: 4 dimensions 1-7
Validators now coerce in place so downstream code sees ints regardless of
the wire format. Added 8 tests (4 unit on coerce_int + 4 per-subagent
contract tests showing stringified values are accepted).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New subsystem for interrogating simulated stakeholders after the OASIS simulation completes: four deterministic instrument runners + a cross-method synthesiser, exposed via a new Flask blueprint and a Vue Step4b with d3 visualisations.
The four interview subagents:
InterviewOrchestratorfans out subagents in parallel afterCOMPLETED;InterviewSynthesizeraggregates into a Markdown report + tidy CSV with an auto-emitted Limitations section. Auto-trigger onSimulationManagerlifecycle hooks (READY → T0; COMPLETED → T1 + others).Design and plan documents
docs/superpowers/specs/2026-05-23-stakeholder-interview-subagents-design.mddocs/superpowers/plans/2026-05-23-stakeholder-interview-subagents.md(21 bite-sized TDD tasks)Notable changes to existing files
ZepGraphMemoryUpdatergainsadd_text_episode(graph_id, text)for direct text writes (bypasses the AgentActivity/batch path)OasisProfileGeneratornow writessource_entity_uuidto bothreddit_profiles.jsonandtwitter_profiles.csv(additive column on Twitter CSV; non-breaking)SimulationManagerlifecycle hooks (register_on_ready,register_on_completed) are class-level so they survive across instancesSimulationRunnerexposes_on_completed_callbacksfor the runner→manager bridgeINTERVIEW_MAX_TOKENS_PER_RUN,INTERVIEW_MAX_WORKERS,INTERVIEW_DEFAULT_LANGUAGE,LLM_STUB_MODE,UPLOADS_DIRLLMClientfor deterministic CI runs covering all four subagentsStats
npm run buildcleanTest plan
cd backend && uv run pytest -q→ 55/55 passingcd frontend && npm run build→ cleananthropic/claude-haiku-4-5via OpenRouter: 3 personas (Thünen-style fisheries scientist, German Bundesrat, ICES) on a real simulation produced distinctly differentiated in-character German Likert responses + ~80-word open comments. Cost ~$0.02, wall time 10s.Known follow-ups (none blocking)
INTERVIEW_MAX_TOKENS_PER_RUNdefined but enforcement not implemented (config-only)/interview/:simulationIdStep4bInterviews.vueis unboundedinstruments_used.jsonwritten at orchestrator-level rather than per-run-id directoryHow to try locally
🤖 Generated with Claude Code