| title | status | authors | based_on | category | source | tags | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Iterative Prompt & Skill Refinement |
proposed |
|
|
Feedback Loops |
|
Agent usage reveals gaps in prompts, skills, and tools—but how do you systematically improve them? When a workflow fails or behaves sub-optimally, you need multiple mechanisms to capture feedback and iterate. Single approaches aren't enough; you need a multi-pronged refinement strategy.
Implement multiple complementary refinement mechanisms that work together. No single mechanism catches all issues—you need layered approaches.
Four key mechanisms:
1. Responsive Feedback (Primary)
- Monitor internal
#aichannel for issues - Skim workflow interactions daily
- This is the most valuable ongoing source of improvement
2. Owner-Led Refinement (Secondary)
- Store prompts in editable documents (Notion, Google Docs)
- Most prompts editable by anyone at the company
- Include prompt links in workflow outputs (Slack messages, Jira comments)
- Prompts must be discoverable + editable
3. Claude-Enhanced Refinement (Specialized)
- Use Datadog MCP to pull logs into skill repository
- Skills are a "platform" used by many workflows
- Often maintained by central AI team, not individual owners
4. Dashboard Tracking (Quantitative)
- Track workflow run frequency and errors
- Track tool usage (how often each skill loads)
- Data-driven prioritization of improvements
graph TD
A[Workflow Runs] --> B[Feedback Channel: #ai]
A --> C[Owner Edits Prompts]
A --> D[Datadog Logs → Claude]
A --> E[Dashboards: Metrics]
B --> F[Identify Issues]
C --> F
D --> F
E --> F
F --> G[Update Prompts/Skills]
G --> A
style B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style E fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
Implementation checklist:
- Feedback channel: Internal Slack/Discord for agent issues
- Editable prompts: Store in Notion/docs, not code
- Prompt links: Include in every workflow output
- Log access: Datadog/observability with MCP integration
- Dashboards: Track workflow runs, errors, tool usage
Refinement workflow:
# After each workflow run, include link
workflow_result = {
"output": "...",
"prompt_link": "https://notion.so/prompt-abc123"
}Discovery strategy:
- Daily: Skim feedback channel, review workflow interactions
- Weekly: Review dashboard metrics for error spikes
- Ad-hoc: Pull logs when specific issues reported
- Quarterly: Comprehensive prompt/skill audit
Post-run evals (next step):
Include subjective eval after each run:
- Was this workflow effective?
- What would have made it better?
- Human-in-the-loop to nudge evolution
Pros:
- Multi-layered: Catches issues different mechanisms miss
- Continuous: Always improving, not episodic
- Accessible: Anyone can contribute to improvement
- Data-driven: Dashboards prioritize what matters
- Skill-sharing: Central team can maintain platform-level skills
Cons:
- No silver bullet: Can't eliminate any mechanism
- Maintenance overhead: Multiple systems to manage
- Permission complexity: Need balanced edit access
- Alert fatigue: Too many signals can overwhelm
Workflow archetypes:
| Type | Refinement Strategy |
|---|---|
| Chatbots | Post-run evals + human-in-the-loop |
| Well-understood workflows | Code-driven (deterministic) |
| Not-yet-understood workflows | The open question |
Open challenge: How to scalably identify and iterate on "not-yet-well-understood" workflows without product engineers implementing each individually?
- Iterative prompt and skill refinement - Will Larson (Imprint, 2026)
- Related: Dogfooding with Rapid Iteration, Compounding Engineering, Memory Synthesis from Execution Logs