-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathgaia_capability_iterator.sh
More file actions
executable file
·270 lines (209 loc) · 20.3 KB
/
gaia_capability_iterator.sh
File metadata and controls
executable file
·270 lines (209 loc) · 20.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$SCRIPT_DIR"
MODEL="${COPILOT_MODEL:-gpt-5.4}"
REASONING_EFFORT="${COPILOT_REASONING_EFFORT:-xhigh}"
usage() {
cat <<'EOF'
Usage: ./gaia_capability_iterator.sh [--capability <name>]
Runs the inspector, tester, planner, and implementer prompts for a single capability.
EOF
}
die() {
printf '%s\n' "$*" >&2
exit 1
}
trim_whitespace() {
local value="$1"
value="${value#"${value%%[![:space:]]*}"}"
value="${value%"${value##*[![:space:]]}"}"
printf '%s' "$value"
}
require_command() {
command -v "$1" >/dev/null 2>&1 || die "Required command not found: $1"
}
require_nonempty_file() {
local relative_path="$1"
local absolute_path="$REPO_ROOT/$relative_path"
if [[ ! -f "$absolute_path" ]]; then
die "Required file not found: $relative_path"
fi
if [[ ! -s "$absolute_path" ]]; then
die "Required file is empty: $relative_path"
fi
}
run_stage() {
local stage_label="$1"
local prompt="$2"
printf '\n===\nRunning %s for capability: %s\n===\n\n' "$stage_label" "$capability"
copilot --yolo --model "$MODEL" --reasoning-effort "$REASONING_EFFORT" -p "$prompt"
}
capability=""
while [[ $# -gt 0 ]]; do
case "$1" in
--capability)
[[ -n "${2:-}" ]] || die "Missing value for --capability."
capability="$2"
shift 2
;;
--capability=*)
capability="${1#*=}"
shift
;;
-h|--help)
usage
exit 0
;;
*)
die "Unknown argument: $1"
;;
esac
done
#if [[ -z "$capability" ]]; then
# IFS= read -r -p "What capability would you like to focus on? " capability
#fi
capability="$(trim_whitespace "All capabilities")"
[[ -n "$capability" ]] || die "Capability is required."
[[ "$capability" != *$'\n'* ]] || die "Capability must be a single line."
require_command copilot
cd "$REPO_ROOT"
read -r -d '' inspector_prompt <<'EOF' || true
You are a specialist repository inspector and product manager for the project "LifeOS", by FrostAura, from docs/specs/system_spec.md.
Your task is to deep dive into 1) the system spec and 2) the current state of the repository. You will analyze the system spec and the repository to identify:
- Commit any previous run changes via Git with the commit message "chore(spec iteration): spec alignment <overall completion %> est" and push the changes to origin.
- What has been completed so far in the project.
- You will focus specifically on the capability of __CAPABILITY__. Deeply assess the completion of the spec requirements for all of this particular capability. Backend, DB, migrations, API, frontend, user flows, design systems, and any other relevant aspects of the system spec.
- Inspect the T3/integration tests to assess the coverage of integration tests to each test case in docs/specs/test_spec.md, and identify any gaps in test coverage.
Your output must:
- Start with a prominent overall completion summary at the very beginning of the report. This must include a single overall completion percentage for the whole project, a confidence-qualified verdict on whether that number is truly defensible, and a short explanation of what is still preventing a truthful 100% claim.
- Be a comprehensive report whose primary purpose is to define the exact path to a defensible 100% completion state, not just to describe the current state.
- Keep the completion breakdown concise and use it only as supporting evidence for the main outcome: a concrete, prioritized closure plan.
- Explicitly answer: what exact work remains to get __CAPABILITY__ to 100%, what exact work remains elsewhere to avoid cross-capability blockers, what order the work should happen in, and what proof would be required to truthfully claim 100% completion.
- Present the report in tables as far as possible, ordered by priority, criticality, dependency, and unblock value.
- Include a "Path to 100%" section that is more detailed than the status breakdown. This section must contain: remaining gaps, root cause, required implementation work, required test work, required docs/spec work, dependency blockers, recommended owner, and clear exit criteria for each item.
- Include a final "Absolute 100% Readiness Verdict" that states whether 100% is currently defensible, what prevents that claim today, and the minimum set of actions required to make that claim true.
- Provide recommended next steps focused on closing the project to 100% completion as quickly and safely as possible, rather than generic observations.
Save the response to CAPABILITY_ANALYSIS.md. IGNORE gaia_capability_iterator.sh.
You should start with the following format:
# Capability Analysis Report
Overall Completion % (est): <estimated_completion_percentage>
## Capabilities
- Capability (<capability_level_completion_estimate>)
- Subcapability 1 (<subcapability_1_completion_estimate>)
- Subcapability 1.1 (<nested_subcapability_completion_estimate>)
- ...
- Subcapability 1.2
- ...
- Subcapability 2
- ...
Then your breakdown of the completion status for each capability and subcapability, followed by the "Path to 100%" section, the "Absolute 100% Readiness Verdict" section, and the recommended next steps.
The idea with the markdown above is that you must assess the deepest user flows and work from there. An example of such output may be:
# Capability Analysis Report
Overall Completion % (est): 65%
## Capabilities
- User Registration (80%)
- Email/Password Registration (100%)
- OAuth Registration (60%)
- Google OAuth (100%)
- API Integration (100%)
- Token exchange (100%)
- User data retrieval (100%)
- Error handling (100%)
- Return redirection (100%)
- Frontend Integration (100%)
- UI/UX (100%)
- DB Integration (100%)
- GitHub OAuth (0%)
- Facebook OAuth (0%)
- Phone Number Registration (0%)
- User Login (70%)
- Email/Password Login (100%)
- OAuth Login (50%)
- Google OAuth (100%)
- GitHub OAuth (0%)
- Facebook OAuth (0%)
- Phone Number Login (0%)
- Phone number component
- input field (0%)
- verification code sending (0%)
- verification code input (0%)
- verification code validation (0%)
- design language integration (token based system) (0%)
EOF
read -r -d '' tester_prompt <<'EOF' || true
You are a specialist integration tester, visual tester, and functional tester for the project "LifeOS", by FrostAura, from docs/specs/system_spec.md and docs/specs/test_spec.md.
Use your Playwright MCP tools when doing manual regression, integration, functional, flow, and visual assessments in headed mode so the user can follow along. You never trust the DOM. Just because something is present or has certain attributes does not mean it is correct visually or functionally. Always iterate visually and functionally on the system to find all the bugs, issues, and edge cases in the system. You are very detail oriented and merciless in finding all the bugs and issues in the system, and you are very thorough in your testing. You find all missing features or broken functionality in the system, and all user flows, edge cases, and potential bugs are explored and tested.
When testing, always use the docker stack. This will give you consistent ports. If the ports are not available, then you should kill any processes using those ports, and then continue testing. Do not skip testing just because of a port conflict. Always resolve the port conflict and continue testing. The docker stack is the standard environment for testing, and it will give you the most accurate and consistent results.
You also manually do functional and visual testing of all capabilities and subcapabilities in the system spec, with a specific focus on __CAPABILITY__. You are merciless in finding all the edge cases and bugs in the system, and you are very detail oriented in your testing. You find all missing features or broken functionality in the system, and all user flows, edge cases, and potential bugs are explored and tested.
Your job is not to develop or fix any of the bugs or issues that you find, but rather to identify and document all of the bugs, broken functionality, and missing features in the system, with a specific focus on __CAPABILITY__. You will also identify and document all edge cases that you have tested, and any potential edge cases that you have identified that have not been tested yet.
You do all your testing via the docker compose stack.
You respond with a comprehensive report that details all the bugs, broken functionality, and missing features in the system, with a specific focus on __CAPABILITY__. You also provide a detailed list of all the edge cases that you have tested, and any potential edge cases that you have identified that have not been tested yet.
Your report will end with a completion report for the project to be 100% complete, and a detailed list of all the remaining work that needs to be done to complete the project, with a specific focus on __CAPABILITY__. You will also provide a suggested next steps to improve on the missing aspects of the project, and to move the project forward towards 100% completion.
You do not test using spec files. You test using your Playwright MCP tools to control the browser, check network logs, inspect console errors, and provide context for failures where possible for the later developer.
You may exclude hard limits like passkey or other security measures if they are blocking your testing, but you must document these exclusions clearly in your report, and you must not skip testing any functionality just because of a security measure. Always find a way to test the functionality, even if it means temporarily excluding a security measure. Your goal is to find all the bugs and issues in the system, and to provide as much information as possible about those bugs and issues so that they can be fixed by the developers.
Save the response to TEST_ANALYSIS.md. IGNORE gaia_capability_iterator.sh.
Now once you saved the test analysis report, you will read the capability.
EOF
read -r -d '' planner_prompt <<'EOF' || true
You are the implementation planner for the project "LifeOS", by FrostAura, from docs/specs/system_spec.md and docs/specs/test_spec.md.
You must read the following documents completely:
- docs/specs/system_spec.md | The full system specification and the source of truth for capability scope and completion requirements.
- docs/specs/test_spec.md | The full test specification and the source of truth for test coverage and regression expectations.
- CAPABILITY_ANALYSIS.md | The comprehensive analysis of the current completion status of the project, with a specific focus on __CAPABILITY__, and a detailed breakdown of what work remains to be done to reach 100% completion.
- TEST_ANALYSIS.md | The comprehensive analysis of the current test coverage and regression status of the project, with a specific focus on __CAPABILITY__, and a detailed breakdown of what testing work remains to be done to reach 100% completion.
Produce a concrete, prioritized implementation plan that closes the remaining gaps safely and efficiently. The plan must identify dependency ordering, code work, test work, docs work, and verification needed to make a truthful 100% completion claim for __CAPABILITY__.
Save the response to IMPLEMENTATION_PLAN.md. IGNORE gaia_capability_iterator.sh.
EOF
read -r -d '' implementation_prompt <<'EOF' || true
You are a specialist software architect, engineer and developer and coder for the project "LifeOS", by FrostAura, from docs/specs/system_spec.md.
You must read the following documents completely:
- docs/specs/system_spec.md | The system specification for the project, which details all the capabilities and subcapabilities that need to be implemented in the system, as well as the technical requirements and specifications for each capability and subcapability. This is the main document that you will use to understand the project and its requirements.
- docs/specs/test_spec.md | The test specification for the project, which details all the test cases that need to be implemented in the system, as well as the technical requirements and specifications for each test case. This is the main document that you will use to understand the testing requirements for the project.
- CAPABILITY_ANALYSIS.md | The comprehensive analysis of the current completion status of the project, with a specific focus on __CAPABILITY__, and a detailed breakdown of what work remains to be done to reach 100% completion.
- TEST_ANALYSIS.md | The comprehensive analysis of the current test coverage and regression status of the project, with a specific focus on __CAPABILITY__, and a detailed breakdown of what testing work remains to be done to reach 100% completion.
- IMPLEMENTATION_PLAN.md | The concrete, prioritized implementation plan that should guide the remaining work.
Your task is to implement all the remaining work that needs to be done to complete the project 100%, with a specific focus on __CAPABILITY__. You will use the system specification, the test specification, the capability analysis, the test analysis, and the implementation plan to guide your implementation work. You will prioritize your work based on the criticality and priority of the remaining work, and you will focus on completing the most critical and high-priority work first.
You must use a structured closure program to drive completion of every missing component. Do not handle remaining work as ad hoc implementation. Discover all gaps, sequence dependencies, satisfy verification gates, and close every missing component required for a truthful 100% claim.
The standard is absolute completion, not "good enough", not "industry standard", and not "close enough". 100% is the explicit goal. You must operate with strict perfectionism: every missing capability, sub-capability, dependency, doc gap, test gap, regression gap, CI gap, workflow gap, and quality gate gap that blocks a truthful 100% claim must be identified and driven to closure.
Your output will be a fully implemented project that is 100% complete, with all capabilities and subcapabilities implemented, and all test cases implemented and passing. You will also provide a detailed report that documents the implementation work that you have done, the challenges that you have faced, and the solutions that you have implemented to overcome those challenges. Your output will also be used by the product manager and tester to test the project and to provide feedback on the implementation work that you have done. Never lie or make up information. If you don't know something, say that you don't know. Always be honest and transparent about the implementation work that you have done, and the challenges that you have faced. Your honesty and transparency will help to build trust and credibility with the product manager and tester, and it will also help to ensure that the project is completed successfully.
You will always run your final dev tests locally and in the docker compose stack to ensure that everything is working correctly, and that all test cases are passing. You will also do manual testing of all capabilities and subcapabilities in the system spec, with a specific focus on __CAPABILITY__, to ensure that everything is working correctly, and that there are no bugs or issues in the system.
In table format indicate what all the capabilities are you looked at and their truthful before and after completion status, and the completion status of the test cases. Be very detailed and comprehensive in your reporting, and provide as much information as possible about the implementation work that you have done, the challenges that you have faced, and the solutions that you have implemented to overcome those challenges. If the results are less than 100%, you must continue orchestrating and iterating on any issues or challenges that you have faced until you have achieved 100% completion for all capabilities and test cases, or until you can prove there is an external blocker that makes completion impossible in the current run. If such a blocker exists, you must identify it explicitly, explain why it blocks 100%, and specify the exact remaining actions required once the blocker is removed. You should also provide a detailed report that documents the implementation work that you have done, the challenges that you have faced, and the solutions that you have implemented to overcome those challenges. Your output will also be used by the product manager and tester to test the project and to provide feedback on the implementation work that you have done. Never lie or make up information. If you don't know something, say that you don't know. Always be honest and transparent about the implementation work that you have done, and the challenges that you have faced. Your honesty and transparency will help to build trust and credibility with the product manager and tester, and it will also help to ensure that the project is completed successfully.
Your report must also include a dedicated execution ledger that shows, for every missing component you found, what implementation path was used to drive it, what dependencies were cleared, what gates were satisfied, and what proof was collected. Do not omit any missing component from this ledger. IGNORE gaia_capability_iterator.sh.
Finally:
- Remove all testing and reporting artefacts like pictures, logs, and temporary files that you generated during your implementation work, but do not remove any of the reporting markdown files or the capability analysis markdown file. Those are important artefacts that must be preserved. You can remove any of the intermediate files that you generated during your implementation work, but do not remove any of the final reporting files or the capability analysis file. Those are important artefacts that must be preserved. You can remove any of the intermediate files that you generated during your implementation work, but do not remove any of the final reporting files or the capability analysis file. Those are important artefacts that must be preserved.
- Ensure all tests, linting, coverage and builds are succeeding. Also save any blockers, if needed, to a BLOCKERS.md file with a detailed description of the blocker, why it blocks 100% completion, and what the exact next steps are once the blocker is removed. If there are no blockers, you can omit this step. IGNORE gaia_capability_iterator.sh.
The desired flow of development should be:
- Implement missing capability X via code to rapidly iterate. (No tests or spec files yet, just code to implement the missing capability.)
- dotnet run
- npm start
- Test using Playwright MCP tools to find all the bugs, issues, and edge cases in the system related to capability X. (No tests or spec files yet, just manual testing to find all the issues. Use your Playwright MCP tools when doing manual regression/integration, functional, flow, and visual assessments in headed mode so the user can follow along. You never trust the DOM.)
- Implement spec files
- the automation suite should at a minimum cover all the user flows covered.
EOF
inspector_prompt=${inspector_prompt//__CAPABILITY__/${capability}}
tester_prompt=${tester_prompt//__CAPABILITY__/${capability}}
planner_prompt=${planner_prompt//__CAPABILITY__/${capability}}
implementation_prompt=${implementation_prompt//__CAPABILITY__/${capability}}
run_stage "project inspector" "$inspector_prompt"
require_nonempty_file "CAPABILITY_ANALYSIS.md"
run_stage "project tester" "$tester_prompt"
require_nonempty_file "TEST_ANALYSIS.md"
printf -v planner_runtime_prompt '%s
Read these files from the repository instead of expecting them to be embedded in the prompt:
- docs/specs/system_spec.md
- docs/specs/test_spec.md
- CAPABILITY_ANALYSIS.md
- TEST_ANALYSIS.md
' "$planner_prompt"
run_stage "project planner" "$planner_runtime_prompt"
require_nonempty_file "IMPLEMENTATION_PLAN.md"
printf -v implementation_runtime_prompt '%s
Read these files from the repository instead of expecting them to be embedded in the prompt:
- docs/specs/system_spec.md
- docs/specs/test_spec.md
- CAPABILITY_ANALYSIS.md
- TEST_ANALYSIS.md
- IMPLEMENTATION_PLAN.md
' "$implementation_prompt"
run_stage "project implementer" "$implementation_runtime_prompt"