Skip to content

Commit 781e0a1

Browse files
committed
feat: v2.15.0 — 9 new prompt techniques, /blueprint skill, MiniMax M2.5, planner→TDD bridge
- Add 9 research-backed prompt techniques (reflexion, react, rubber_duck, test_driven, least_to_most, pre_mortem, scot, pre_post, bdd_spec) — total now 31 - Embed techniques in tool system prompts: minimax_code (SCoT, reflexion, rubber_duck), minimax_agent (ReAct, least_to_most) - Add /blueprint skill — multi-model council → bite-sized TDD implementation plans - Rewrite planner_maker judge_final to output writing-plans format (exact files, test-first, commit points) - Add pre_mortem to /breakdown, /judge critique steps - Add pre/post contracts to /decompose deep-dives - Upgrade MiniMax M2.1 → M2.5 (SWE-Bench 80.2%), per-task temperatures - Update /prompt skill with auto-recommend flow, 30-intent matching guide, 13 categories - Update all model constants and defaults
1 parent 56918e2 commit 781e0a1

33 files changed

+829
-202
lines changed

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,13 +165,14 @@ Implementation: `src/prompt-engineer-lite.ts`
165165

166166
## Claude Code Skills (Bundled)
167167

168-
TachiBot ships 8 skills in the `skills/` directory. These are deployed to `~/.claude/skills/` on install.
168+
TachiBot ships 9 skills in the `skills/` directory. These are deployed to `~/.claude/skills/` on install.
169169

170170
| Skill | Description | Key Tools Used |
171171
|-------|------------|----------------|
172172
| `/judge` | Multi-model council with fallback awareness | grok_search, perplexity_ask, grok_reason, kimi_thinking, openai_reason, gemini_analyze_text |
173173
| `/think` | Sequential reasoning chains | nextThought |
174174
| `/focus` | Mode-based multi-model reasoning | focus |
175+
| `/blueprint` | Multi-model planning → bite-sized TDD output | planner_maker (grok_search, qwen_coder, kimi_thinking, openai_reason, gemini_analyze_text) |
175176
| `/breakdown` | Strategic decomposition pipeline (breadth-first) | execute_prompt_technique (first_principles, least_to_most, patterns, pre_mortem) |
176177
| `/decompose` | Split into sub-problems, deep-dive each (depth-first) | kimi_decompose, nextThought chains |
177178
| `/prompt` | Recommends the right thinking technique for your problem | preview_prompt_technique, execute_prompt_technique |

README.md

Lines changed: 52 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,18 @@
44

55
### Multi-Model AI Orchestration Platform
66

7-
[![Version](https://img.shields.io/badge/version-2.14.7-blue.svg)](https://www.npmjs.com/package/tachibot-mcp)
8-
[![Tools](https://img.shields.io/badge/tools-48_active-brightgreen.svg)](#-tool-ecosystem-48-tools)
7+
[![Version](https://img.shields.io/badge/version-2.15.0-blue.svg)](https://www.npmjs.com/package/tachibot-mcp)
8+
[![Tools](https://img.shields.io/badge/tools-51_active-brightgreen.svg)](#-tool-ecosystem-51-tools)
99
[![License](https://img.shields.io/badge/license-AGPL--3.0-green.svg)](LICENSE)
1010
[![Node](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org)
1111
[![MCP](https://img.shields.io/badge/MCP-Compatible-purple.svg)](https://modelcontextprotocol.io)
1212

13-
**48 AI tools. 7 providers. One protocol.**
13+
**51 AI tools. 7 providers. One protocol.**
1414

15-
Orchestrate Perplexity, Grok, GPT-5, Gemini, Qwen, Kimi K2.5, and MiniMax M2.1
15+
Orchestrate Perplexity, Grok, GPT-5, Gemini, Qwen, Kimi K2.5, and MiniMax M2.5
1616
from Claude Code, Claude Desktop, Cursor, or any MCP client.
1717

18-
[Get Started](#-quick-start) · [View Tools](#-tool-ecosystem-48-tools) · [Documentation](https://tachibot.com/docs)
18+
[Get Started](#-quick-start) · [View Tools](#-tool-ecosystem-51-tools) · [Documentation](https://tachibot.com/docs)
1919

2020
<br>
2121

@@ -28,57 +28,59 @@ from Claude Code, Claude Desktop, Cursor, or any MCP client.
2828

2929
---
3030

31-
## What's New in v2.14.7
31+
## What's New in v2.15.0
3232

33-
### Gemini Judge &amp; Jury System
34-
- **`gemini_judge`** &mdash; Science-backed LLM-as-a-Judge (arXiv:2411.15594). 4 modes: synthesize, evaluate, rank, resolve
35-
- **`jury`** &mdash; Multi-model jury panel. Configurable jurors (grok, openai, qwen, kimi, perplexity, minimax) run in parallel, Gemini synthesizes the verdict. Based on "Replacing Judges with Juries" (Cohere, arXiv:2404.18796)
36-
37-
### Perplexity Model Fixes
38-
- Fixed `sonar-pro` model ID (was accidentally using lightweight `sonar`)
39-
- `perplexity_research` now uses **`sonar-deep-research`** &mdash; exhaustive multi-source reports in a single call
33+
### `/blueprint` Skill &mdash; Multi-Model Implementation Planning
34+
New skill that creates bite-sized TDD implementation plans using a 7-step multi-model council:
35+
```
36+
/blueprint add OAuth with refresh tokens
37+
```
38+
Pipeline: Grok search → Qwen+Kimi analysis → Kimi decompose → GPT pre-mortem critique → Gemini final judgment → **bite-sized TDD output** (exact files, test-first steps, commit points).
4039

41-
### Qwen3-Coder-Next
42-
`qwen_coder` now runs on **Qwen3-Coder-Next** (Feb 2026) &mdash; purpose-built for agentic coding:
40+
Bridges `planner_maker`'s multi-model intelligence with the `writing-plans` execution format.
4341

44-
| | Before (qwen3-coder) | After (qwen3-coder-next) |
45-
|---|---|---|
46-
| **Params** | 480B / ~35B active | 80B / 3B active |
47-
| **Context** | 131K | 262K |
48-
| **SWE-Bench** | 69.6% | >70% |
49-
| **Pricing** | $0.22/$0.88 per M | $0.07/$0.30 per M |
42+
### 31 Prompt Engineering Techniques (was 22)
43+
Added 9 research-backed techniques for coding and decision-making:
5044

51-
3x cheaper, 2x context, better benchmarks. Falls back to legacy 480B on provider failure.
45+
| Technique | Source | Category |
46+
|-----------|--------|----------|
47+
| `reflexion` | Shinn et al. 2023 | Engineering |
48+
| `react` (ReAct) | Yao et al. 2022 | Engineering |
49+
| `rubber_duck` | Hunt & Thomas 2008 | Engineering |
50+
| `test_driven` | Beck 2003 | Engineering |
51+
| `scot` (Structured CoT) | Li et al. 2025 (+13.79% HumanEval) | Structured Coding |
52+
| `pre_post` (Contracts) | Empirical SE 2025 | Structured Coding |
53+
| `bdd_spec` (Given/When/Then) | BDD 2025 | Structured Coding |
54+
| `least_to_most` | Zhou et al. 2022 | Research |
55+
| `pre_mortem` | Klein 2007 | Decision |
5256

53-
### Kimi K2.5 Suite (4 tools)
54-
| Tool | Capability | Highlight |
55-
|------|-----------|-----------|
56-
| `kimi_thinking` | Step-by-step reasoning | Agent Swarm architecture |
57-
| `kimi_code` | Code generation & fixing | SWE-Bench 76.8% |
58-
| `kimi_decompose` | Task decomposition | Dependency graphs, parallel subtasks |
59-
| `kimi_long_context` | Document analysis | 256K context window |
57+
Techniques are embedded directly in tool system prompts for automatic application.
6058

61-
### MiniMax M2.1 (2 tools)
62-
- `minimax_code` &mdash; SWE tasks at very low cost (72.5% SWE-Bench)
63-
- `minimax_agent` &mdash; Agentic workflows (77.2% &tau;&sup2;-Bench)
59+
### MiniMax M2.5 Upgrade
60+
- `minimax_code` &mdash; SWE-Bench **80.2%**, per-task TECHNIQUE tags (SCoT, reflexion, rubber_duck), per-task temperatures
61+
- `minimax_agent` &mdash; ReAct + least-to-most decomposition protocol, HALT criteria
6462

65-
### Qwen Reasoning
66-
- `qwen_reason` &mdash; Heavy reasoning with Qwen3-Max-Thinking (>1T params, 98% HMMT math)
63+
### Enhanced Skills
64+
- `/breakdown` &mdash; now uses `least_to_most` ordering + `pre_mortem` failure analysis
65+
- `/judge` &mdash; first judge now runs pre-mortem ("assume this FAILED")
66+
- `/decompose` &mdash; deep-dives include pre/post contracts per sub-problem
67+
- `/prompt` &mdash; auto-recommend flow with 30-intent matching guide, 13 categories
6768

6869
---
6970

7071
## Skills (Claude Code)
7172

72-
TachiBot ships with 8 slash commands for Claude Code. These orchestrate the tools into powerful workflows:
73+
TachiBot ships with 9 slash commands for Claude Code. These orchestrate the tools into powerful workflows:
7374

7475
| Skill | What it does | Example |
7576
|-------|-------------|---------|
77+
| `/blueprint` | Multi-model planning → bite-sized TDD steps | `/blueprint add OAuth with refresh tokens` |
7678
| `/judge` | Multi-model council - parallel analysis with synthesis | `/judge how to implement rate limiting` |
7779
| `/think` | Sequential reasoning chain with any model | `/think grok,gemini design a cache layer` |
7880
| `/focus` | Mode-based reasoning (debate, research, analyze) | `/focus architecture-debate Redis vs Pg` |
79-
| `/breakdown` | Strategic decomposition with feasibility check | `/breakdown add OAuth with refresh tokens` |
81+
| `/breakdown` | Strategic decomposition with pre-mortem | `/breakdown refactor payment module` |
8082
| `/decompose` | Split into sub-problems, deep-dive each one | `/decompose implement collaborative editor` |
81-
| `/prompt` | Pick the right thinking technique for your problem | `/prompt why do users churn` |
83+
| `/prompt` | Recommend the right thinking technique (31 available) | `/prompt why do users churn` |
8284
| `/algo` | Algorithm analysis with 3 specialized models | `/algo optimize LRU cache O(1)` |
8385
| `/tachi` | Help - see available skills, tools, key status | `/tachi` |
8486

@@ -91,26 +93,26 @@ Skills automatically adapt to your configured API keys. Even with just 1-2 provi
9193
## Key Features
9294

9395
### Multi-Model Intelligence
94-
- **48 AI Tools** across 7 providers &mdash; Perplexity, Grok, GPT-5, Gemini, Qwen, Kimi, MiniMax
95-
- **Multi-Model Council** &mdash; planner_maker synthesizes plans from 5+ models
96+
- **51 AI Tools** across 7 providers &mdash; Perplexity, Grok, GPT-5, Gemini, Qwen, Kimi, MiniMax
97+
- **Multi-Model Council** &mdash; planner_maker synthesizes plans from 5+ models into bite-sized TDD steps
9698
- **Smart Routing** &mdash; Automatic model selection for optimal results
9799
- **OpenRouter Gateway** &mdash; Optional single API key for all providers
98100

99101
### Advanced Workflows
100102
- **YAML-Based Workflows** &mdash; Multi-step AI processes with dependency graphs
101-
- **Prompt Engineering** &mdash; 14 research-backed techniques built-in
103+
- **Prompt Engineering** &mdash; 31 research-backed techniques (including SCoT, ReAct, Reflexion)
102104
- **Verification Checkpoints** &mdash; 50% / 80% / 100% with automated quality scoring
103105
- **Parallel Execution** &mdash; Run multiple models simultaneously
104106

105107
### Tool Profiles
106108
| Profile | Tools | Best For |
107109
|---------|-------|----------|
108110
| **Minimal** | 12 | Quick tasks, low token budget |
109-
| **Research Power** | 30 | Deep investigation, multi-source |
110-
| **Code Focus** | 28 | Software development, SWE tasks |
111-
| **Balanced** | 38 | General-purpose, mixed workflows |
112-
| **Heavy Coding** (default) | 44 | Max code tools + agentic workflows |
113-
| **Full** | 50 | Everything enabled |
111+
| **Research Power** | 31 | Deep investigation, multi-source |
112+
| **Code Focus** | 29 | Software development, SWE tasks |
113+
| **Balanced** | 39 | General-purpose, mixed workflows |
114+
| **Heavy Coding** (default) | 45 | Max code tools + agentic workflows |
115+
| **Full** | 51 | Everything enabled |
114116

115117
### Developer Experience
116118
- **Claude Code** &mdash; First-class support
@@ -172,19 +174,19 @@ See [Installation Guide](docs/INSTALLATION_BOTH.md) for detailed instructions.
172174

173175
---
174176

175-
## Tool Ecosystem (48 Tools)
177+
## Tool Ecosystem (51 Tools)
176178

177179
### Research & Search (6)
178180
`perplexity_ask` &#183; `perplexity_research` &#183; `perplexity_reason` &#183; `grok_search` &#183; `openai_search` &#183; `gemini_search`
179181

180-
### Reasoning & Planning (8)
181-
`grok_reason` &#183; `openai_reason` &#183; `qwen_reason` &#183; `kimi_thinking` &#183; `kimi_decompose` &#183; `planner_maker` &#183; `planner_runner` &#183; `list_plans`
182+
### Reasoning & Planning (9)
183+
`grok_reason` &#183; `openai_reason` &#183; `qwen_reason` &#183; `qwq_reason` &#183; `kimi_thinking` &#183; `kimi_decompose` &#183; `planner_maker` &#183; `planner_runner` &#183; `list_plans`
182184

183185
### Code Intelligence (8)
184186
`kimi_code` &#183; `grok_code` &#183; `grok_debug` &#183; `qwen_coder` &#183; `qwen_algo` &#183; `qwen_competitive` &#183; `minimax_code` &#183; `minimax_agent`
185187

186-
### Analysis & Brainstorming (9)
187-
`gemini_analyze_text` &#183; `gemini_analyze_code` &#183; `gemini_brainstorm` &#183; `openai_brainstorm` &#183; `openai_code_review` &#183; `openai_explain` &#183; `grok_brainstorm` &#183; `grok_architect` &#183; `kimi_long_context`
188+
### Analysis & Judgment (11)
189+
`gemini_analyze_text` &#183; `gemini_analyze_code` &#183; `gemini_judge` &#183; `jury` &#183; `gemini_brainstorm` &#183; `openai_brainstorm` &#183; `openai_code_review` &#183; `openai_explain` &#183; `grok_brainstorm` &#183; `grok_architect` &#183; `kimi_long_context`
188190

189191
### Meta & Orchestration (5)
190192
`think` &#183; `nextThought` &#183; `focus` &#183; `tachi` &#183; `usage_stats`

docs/TOOLS_REFERENCE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TachiBot MCP - Complete Tools Reference
22

3-
**Complete parameter schemas, advanced usage, and examples for all 31 tools (32 with competitive mode)**
3+
**Complete parameter schemas, advanced usage, and examples for all 51 tools**
44

55
---
66

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"name": "tachibot-mcp",
33
"mcpName": "io.github.byPawel/tachibot-mcp",
44
"displayName": "TachiBot MCP - Universal AI Orchestrator",
5-
"version": "2.14.7",
5+
"version": "2.15.0",
66
"type": "module",
77
"main": "dist/src/server.js",
88
"bin": {

profiles/balanced.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"description": "Balanced set for general use (~Xk tokens, 38 tools)",
2+
"description": "Balanced set for general use (~Xk tokens, 39 tools)",
33
"tools": {
44
"think": true,
55
"focus": true,
@@ -28,6 +28,7 @@
2828
"jury": true,
2929
"qwen_coder": true,
3030
"qwen_algo": true,
31+
"qwq_reason": true,
3132
"qwen_reason": true,
3233
"kimi_thinking": true,
3334
"kimi_code": true,

profiles/code_focus.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"description": "Code-heavy work with debugging and analysis (~Xk tokens, 28 tools)",
2+
"description": "Code-heavy work with debugging and analysis (~Xk tokens, 29 tools)",
33
"tools": {
44
"think": true,
55
"focus": true,
@@ -28,6 +28,7 @@
2828
"jury": false,
2929
"qwen_coder": true,
3030
"qwen_algo": true,
31+
"qwq_reason": true,
3132
"qwen_reason": false,
3233
"kimi_thinking": true,
3334
"kimi_code": true,

profiles/full.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"description": "All tools enabled for maximum capability (50 tools)",
2+
"description": "All tools enabled for maximum capability (51 tools)",
33
"tools": {
44
"think": true,
55
"focus": true,
@@ -28,6 +28,7 @@
2828
"jury": true,
2929
"qwen_coder": true,
3030
"qwen_algo": true,
31+
"qwq_reason": true,
3132
"qwen_reason": true,
3233
"kimi_thinking": true,
3334
"kimi_code": true,

profiles/heavy_coding.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"description": "Default profile — heavy coding with all reasoning & code tools (44 tools)",
2+
"description": "Default profile — heavy coding with all reasoning & code tools (45 tools)",
33
"tools": {
44
"think": true,
55
"focus": true,
@@ -28,6 +28,7 @@
2828
"jury": true,
2929
"qwen_coder": true,
3030
"qwen_algo": true,
31+
"qwq_reason": true,
3132
"qwen_reason": true,
3233
"kimi_thinking": true,
3334
"kimi_code": true,

profiles/minimal.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
"jury": false,
2929
"qwen_coder": true,
3030
"qwen_algo": false,
31+
"qwq_reason": false,
3132
"qwen_reason": false,
3233
"kimi_thinking": false,
3334
"kimi_code": false,

profiles/research_power.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"description": "Research-focused with Grok search + all Perplexity + brainstorming (~Xk tokens, 30 tools)",
2+
"description": "Research-focused with Grok search + all Perplexity + brainstorming (~Xk tokens, 31 tools)",
33
"tools": {
44
"think": true,
55
"focus": true,
@@ -28,6 +28,7 @@
2828
"jury": true,
2929
"qwen_coder": true,
3030
"qwen_algo": false,
31+
"qwq_reason": true,
3132
"qwen_reason": true,
3233
"kimi_thinking": true,
3334
"kimi_code": true,

0 commit comments

Comments
 (0)