feat: stakeholder interview subagents (4-method post-simulation surveys) by ChristianMoellmann · Pull Request #643 · 666ghj/MiroFish

ChristianMoellmann · 2026-05-23T11:52:45Z

Summary

New subsystem for interrogating simulated stakeholders after the OASIS simulation completes: four deterministic instrument runners + a cross-method synthesiser, exposed via a new Flask blueprint and a Vue Step4b with d3 visualisations.

The four interview subagents:

Longitudinal — 12-item Likert administered pre-OASIS (T0) and post-OASIS (T1) to measure opinion drift induced by simulated peer interaction
Diversity — 24-statement Q-sort + 6 multi-dim Likert axes → PCA + k-means → stakeholder typology
Delphi — 3 rounds (open → rate → revise with anonymised group medians) → convergence metrics
Scenario — 4 future scenarios × 4 dimensions (desirability, plausibility, group-impact, fairness) → polarity matrix

InterviewOrchestrator fans out subagents in parallel after COMPLETED; InterviewSynthesizer aggregates into a Markdown report + tidy CSV with an auto-emitted Limitations section. Auto-trigger on SimulationManager lifecycle hooks (READY → T0; COMPLETED → T1 + others).

Design and plan documents

Spec: docs/superpowers/specs/2026-05-23-stakeholder-interview-subagents-design.md
Plan: docs/superpowers/plans/2026-05-23-stakeholder-interview-subagents.md (21 bite-sized TDD tasks)

Notable changes to existing files

ZepGraphMemoryUpdater gains add_text_episode(graph_id, text) for direct text writes (bypasses the AgentActivity/batch path)
OasisProfileGenerator now writes source_entity_uuid to both reddit_profiles.json and twitter_profiles.csv (additive column on Twitter CSV; non-breaking)
SimulationManager lifecycle hooks (register_on_ready, register_on_completed) are class-level so they survive across instances
SimulationRunner exposes _on_completed_callbacks for the runner→manager bridge
New deps: PyYAML, scikit-learn, scipy, numpy, pandas
New config keys: INTERVIEW_MAX_TOKENS_PER_RUN, INTERVIEW_MAX_WORKERS, INTERVIEW_DEFAULT_LANGUAGE, LLM_STUB_MODE, UPLOADS_DIR
LLM stub mode in LLMClient for deterministic CI runs covering all four subagents

Stats

23 commits, 64 files, ~3,800 LoC
55 backend tests (53 unit + 2 integration), all passing
Frontend npm run build clean
All instruments bilingual DE/EN (German default, since the seed corpus is German fisheries discourse)

Test plan

cd backend && uv run pytest -q → 55/55 passing
cd frontend && npm run build → clean
Real-LLM smoke run against anthropic/claude-haiku-4-5 via OpenRouter: 3 personas (Thünen-style fisheries scientist, German Bundesrat, ICES) on a real simulation produced distinctly differentiated in-character German Likert responses + ~80-word open comments. Cost ~$0.02, wall time 10s.
Scale-up to all 23 agents on a real simulation (estimated ~$1–2, ~15 min)
Review by domain expert of the German Likert items and the 4 scenarios (currently drafted, not derived from a validated instrument)

Known follow-ups (none blocking)

INTERVIEW_MAX_TOKENS_PER_RUN defined but enforcement not implemented (config-only)
§8 instrument-health plausibility flags from the spec not implemented in synthesiser
LLM transport-layer retries (network 502s); schema-retry exists, transport-retry does not
In-app nav link to Step4b — currently reachable only by direct URL /interview/:simulationId
Polling loop in Step4bInterviews.vue is unbounded
Wilcoxon signed-rank promised in spec §5.1 but not yet implemented (scipy imported, unused)
instruments_used.json written at orchestrator-level rather than per-run-id directory

How to try locally

git checkout feat/interview-subagents
cd backend && uv sync --python 3.12 && uv run pytest -q
# Real run with stub LLM (free, deterministic):
LLM_STUB_MODE=true uv run pytest -m integration

🤖 Generated with Claude Code

Approved design for a four-subagent post-simulation interview system (Longitudinal, Diversity, Delphi, Scenario) over MiroFish-simulated German fisheries stakeholders, with cross-method synthesiser. Includes architecture, instrument design, data flow, API surface, error handling, validation, testing, and methodological caveats. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bite-sized TDD plan covering 21 tasks across 7 phases: setup → foundation (models, YAML loader, LLM stub, base interviewer) → 4 subagents (longitudinal, diversity Q-sort+PCA, Delphi 3-round, scenario) → storage + Zep writer → orchestrator + sim lifecycle hooks + synthesiser → Flask /api/interview blueprint → end-to-end integration test → Vue Step4b with d3 visualisations. Each task lists exact files, failing test code, implementation code, run commands, and commit message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anguage, stub mode) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… hash freezing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ting and schema retry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…A/k-means typology Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nvergence metrics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olarity matrix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nd latest pointer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… per-agent + aggregate episodes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-out, isolated failures Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Manager Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…limitations section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…vices to interview subsystem Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…c + CSV export Add /api/interview blueprint with POST pre/post/rerun, GET status/results/synthesis/export.csv endpoints. Background tasks tracked by UUID in module-level dict. Add register_blueprints() helper to api/__init__.py and wire app factory through it. Add UPLOADS_DIR to Config with env-override default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…for all 4 subagents Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lient, i18n keys Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…, Delphi, scenario polarity, synthesis Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ner→Manager on COMPLETED - Add backend/app/services/interviews/lifecycle.py with install_hooks() that registers on_ready (pre-survey) and on_completed (post-survey + synthesis) daemon-thread callbacks on a SimulationManager. - Add SimulationRunner.register_on_completed() / _fire_on_completed() so external callbacks can be notified when _monitor_simulation transitions to COMPLETED (both exit-code-0 path and simulation_end event path). - Wire both in app/__init__.py: create singleton SimulationManager, install lifecycle hooks, and register its _notify_on_completed with SimulationRunner. - Add test_lifecycle.py: verifies install_hooks registers one callable for each of ready and completed. - All 40 unit tests + 2 integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on runs (C1-C5) Five tightly-coupled fixes that were causing the interview subsystem to silently degrade in production: - C1+C2: `_build_orchestrator` now resolves `graph_id` from `SimulationManager().get_simulation(sim_id).graph_id` (the real persisted state) instead of a `graph_id.txt` that nothing in the codebase writes. `ZepGraphMemoryUpdater(graph_id=...)` is now called with the correct positional argument; the bare `try/except Exception` that was swallowing the TypeError is replaced with a narrow fallback that logs explicitly. - C3: `SimulationManager._on_ready_hooks` / `_on_completed_hooks` are now class-level (mirroring `SimulationRunner._on_completed_callbacks`). Hooks registered at app startup now survive across the per-request `SimulationManager()` instances created by the Flask API, so the T0 longitudinal auto-survey actually fires. - C4: `ZepGraphMemoryUpdater` gains an explicit `add_text_episode(graph_id, text)` method for synchronous text writes. `InterviewZepWriter._emit` no longer silently falls back to a dict-shaped `add_activity` call that the real implementation rejects (its `add_activity` requires an `AgentActivity` dataclass). - C5: `FileSystemPersonaProvider.agent_to_entity()` builds an `{agent_id: zep_entity_uuid}` map from the persisted profile files; the map is now passed to `ZepMemoryProvider` so `get_entity_with_context` is called with real Zep UUIDs instead of `str(agent_id)`. To make this work, `OasisProfileGenerator._save_reddit_json` and `_save_twitter_csv` now persist `source_entity_uuid` (Reddit JSON: optional field; Twitter CSV: appended column). Tests: 51 unit + 2 integration pass (was 40 + 2). New tests lock in each fix: - `test_hooks_survive_across_instances` (C3) - `test_build_orchestrator_reads_graph_id_from_state` (C1+C2+C5) - `test_build_orchestrator_falls_back_when_state_missing` (C1+C2) - `test_emit_uses_add_text_episode_with_graph_id`, `test_emit_raises_when_updater_lacks_add_text_episode`, `test_real_updater_exposes_add_text_episode` (C4) - `test_agent_to_entity_from_reddit_json`, `test_agent_to_entity_empty_when_no_field`, `test_agent_to_entity_falls_back_to_twitter_csv`, `test_agent_to_entity_reddit_takes_precedence` (C5) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds SchemaValidationFailure exception carrying both retry attempts' raw output, so audit.jsonl preserves what the model actually said when an agent's response can't be coerced into the instrument schema. Lets us diagnose persona-vs-format failures without re-running. Two new tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Real LLMs (observed with anthropic/claude-haiku-4-5 on a 23-agent run) sometimes return Likert values as JSON strings ('3' not 3). The 4 subagent validators rejected this with isinstance(v, int), losing ~30% of agents at N=23. Added a shared coerce_int helper in base.py that accepts ints and numeric strings, rejects bools/floats/garbage, and is now used by: - Longitudinal: response values 1-5 - Diversity: Q-sort placements -3..+3 and 6 Likert axes 1-7 - Delphi: R2 and R3 importance/plausibility 1-5 - Scenario: 4 dimensions 1-7 Validators now coerce in place so downstream code sees ints regardless of the wire format. Added 8 tests (4 unit on coerce_int + 4 per-subagent contract tests showing stringified values are accepted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ChristianMoellmann and others added 26 commits May 23, 2026 10:53

chore(interviews): add deps and pytest scaffold for interview subsystem

f63bc55

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): add interview config keys (token budget, workers, l…

071f8b5

…anguage, stub mode) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): add pydantic models for instruments and responses

f1898b4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): YAML instrument loader with pydantic validation and…

29be754

… hash freezing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): LLM stub mode for deterministic CI tests

eb3c362

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): StakeholderInterviewer base with in-character promp…

289a0cf

…ting and schema retry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): longitudinal subagent + 12-item Likert instrument

0fcb815

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): diversity subagent with Q-sort + 6 Likert axes + PC…

75762cc

…A/k-means typology Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): Delphi subagent (3 rounds: open, rate, revise) + co…

5d7111b

…nvergence metrics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): scenario subagent with 4 futures × 4 dimensions + p…

ae4941d

…olarity matrix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): JSONL/JSON storage layout with run_id directories a…

998cf1a

…nd latest pointer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): Zep writer adapts add_activity/add_text_episode for…

cca6736

… per-agent + aggregate episodes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): orchestrator with two-phase lifecycle, parallel fan…

b3e2039

…-out, isolated failures Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): on_ready / on_completed hook registry on Simulation…

3322bcb

…Manager Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): synthesiser emits cross-method report + tidy CSV + …

d79c81d

…limitations section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): persona + Zep memory adapters bridging existing ser…

bc07170

…vices to interview subsystem Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test(interviews): end-to-end pipeline test + content-aware LLM stubs …

61f13a8

…for all 4 subagents Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): Step4b Vue scaffold with five-tab navigation, API c…

fede66c

…lient, i18n keys Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(interviews): d3 visualisations for longitudinal Δ, diversity PCA…

acaa061

…, Delphi, scenario polarity, synthesis Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stakeholder interview subagents (4-method post-simulation surveys)#643

feat: stakeholder interview subagents (4-method post-simulation surveys)#643
ChristianMoellmann wants to merge 26 commits into
666ghj:mainfrom
ChristianMoellmann:feat/interview-subagents

ChristianMoellmann commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChristianMoellmann commented May 23, 2026

Summary

Design and plan documents

Notable changes to existing files

Stats

Test plan

Known follow-ups (none blocking)

How to try locally

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant