@@ -41,6 +41,10 @@ The command can be any runnable benchmark or training loop. It just needs to:
4141- exit nonzero on failure
4242- log the configured metric to ` run_dir/metrics.jsonl `
4343
44+ Useful command placeholders are ` {run_dir} ` for the attempt artifact directory,
45+ ` {attempt_name} ` for numeric attempt ids like ` 001 ` , and ` {run_root} ` for the
46+ assembled ` LOG_ROOT/RUN_NAME ` directory.
47+
4448``` python
4549ml_logger.log_metrics({" accuracy" : 0.73 }, step = 1 )
4650```
@@ -67,27 +71,40 @@ recipe-specific settings.
6771The easiest Kubernetes path is the small CLI. From ` examples/autoresearch ` :
6872
6973``` bash
70- uv run --project .. python -m harness.cli run recipes/text_sql name=text-sql-v1
74+ uv run python -m harness.cli recipes/text_sql run_name=alpha
7175```
7276
73- That command creates an ignored generated overlay under ` .runs/text-sql-v1 ` ,
77+ That command creates an ignored generated overlay under ` .runs/text-sql-alpha ` ,
7478copies the flat recipe directory into a ConfigMap, mounts it into the stable
75- researcher image, sets ` RECIPE ` , ` LOG_ROOT ` , and ` SPEC_HASH ` , and runs
76- ` kubectl apply -k ` .
79+ researcher image, sets ` RECIPE ` , ` LOG_ROOT ` , ` RUN_NAME ` , and ` SPEC_HASH ` , and
80+ runs ` kubectl apply -k ` . ` run_name ` becomes the researcher id; the shared
81+ session name defaults to the recipe directory.
82+
83+ Launch a second researcher into the same UI session by changing only
84+ ` run_name ` :
85+
86+ ``` bash
87+ uv run python -m harness.cli recipes/text_sql run_name=beta
88+ ```
89+
90+ The CLI keeps the shared session name as the recipe name (` text-sql ` here) and
91+ uses ` run_name ` for the researcher id. Both researchers write under the same
92+ ` LOG_ROOT/text-sql ` directory, so the UI shows both in the left rail.
7793
7894Preview without applying:
7995
8096``` bash
81- uv run --project .. python -m harness.cli run recipes/text_sql \
82- name=text-sql-v1 \
97+ uv run python -m harness.cli recipes/text_sql \
98+ run_name=alpha \
8399 apply=False
84100```
85101
86102Pass common recipe env directly:
87103
88104``` bash
89- uv run --project .. python -m harness.cli run recipes/my_recipe \
90- name=my-recipe-v1 \
105+ uv run python -m harness.cli recipes/my_recipe \
106+ run_name=alpha \
107+ session_name=my-recipe-v1 \
91108 tinker_base_url=http://open-rl-gateway-service:8000 \
92109 base_model=google/gemma-4-e2b
93110```
@@ -107,6 +124,14 @@ and calls the shared OpenRL/Tinker services.
107124
108125## Cluster Run
109126
127+ These manifests require the official Agent Sandbox CRD. The researcher resource
128+ kind is ` agents.x-k8s.io/v1alpha1/Sandbox ` ; there is no plain Kubernetes ` Job `
129+ fallback in this demo. Verify the CRD before applying a recipe:
130+
131+ ``` bash
132+ kubectl api-resources | grep -i sandbox
133+ ```
134+
110135Create the API secret for agent-backed researcher pods:
111136
112137``` bash
@@ -147,16 +172,15 @@ Use the normal [GKE setup guide](../../docs/setup/gke-setup.md) for cluster,
147172GPU, storage, and the OpenRL backend. These overlays add researcher sandboxes and
148173the UI on top of that shared backend.
149174
150- Researcher pods use an init container to wait for comma-separated ` READY_URLS ` ,
151- so early pod startup does not race vLLM, the trainer worker, or the gateway. The
152- agent starts only after those endpoints are reachable.
175+ Researcher pods use a shared init container to wait for vLLM, the trainer
176+ worker, and the gateway before the agent starts.
153177
154178## Shared Pieces
155179
156180``` text
157181harness/cli.py # creates/applies a generated overlay for a recipe dir
158182harness/agent.py # prepares git, records baseline, launches Gemini
159- harness/attempt.py # runs one measured attempt and writes attempt .json
183+ harness/attempt.py # runs one measured attempt and writes metadata .json
160184harness/serve.py # read-only UI server over researcher/attempt manifests
161185harness/utils.py # shared JSON, git, hashing, process helpers
162186k8s/base/ # reusable Sandbox/UI resources
@@ -171,13 +195,14 @@ workspace at `RECIPE`'s parent and committed as the run baseline. That lets the
171195image stay stable while recipe files come from shared storage.
172196
173197` harness.attempt ` runs recipe code and writes artifacts. The UI reads
174- ` LOG_ROOT/researchers/*/researcher.json ` ,
175- ` LOG_ROOT/researchers/*/attempts/*/attempt.json ` , and fixed artifact filenames
176- next to those manifests. Clearing ` LOG_ROOT ` resets the visible run.
177-
178- The launcher records the unmodified default config as ` 000-baseline ` , then
179- passes the recipe-adjacent ` program.md ` to Gemini as the prompt. That program
180- tells the agent to edit only the declared target, commit the attempt, run
198+ ` LOG_ROOT/RUN_NAME/researchers/*/metadata.json ` ,
199+ ` LOG_ROOT/RUN_NAME/researchers/*/attempts/*/metadata.json ` , and fixed artifact
200+ filenames next to those manifests. Clearing ` LOG_ROOT/RUN_NAME ` resets the
201+ visible run.
202+
203+ The launcher records the unmodified default config as attempt ` 000 ` , then passes
204+ the recipe-adjacent ` program.md ` to Gemini as the prompt. That program tells the
205+ agent to edit only the declared target, commit the attempt, run
181206` eval "${RUN_ATTEMPT_COMMAND}" ` , record the metric, and reset if the metric did
182207not improve.
183208
@@ -189,14 +214,12 @@ Copy one existing recipe directory and update:
189214- ` autoresearch.toml `
190215- the command target, if you keep one
191216- the editable target
192- - ` kustomization.yaml ` settings: ` RECIPE ` , ` LOG_ROOT ` , and
217+ - ` kustomization.yaml ` settings: ` RECIPE ` , ` LOG_ROOT ` , ` RUN_NAME ` , and
193218 ` ATTEMPT_TIMEOUT_MINUTES `
194219- optionally ` RECIPE_DIR ` , if Kubernetes should use a recipe uploaded to shared
195220 storage instead of the recipe already in the image
196221- optionally ` AGENT_TIMEOUT_MINUTES ` , if the researcher pod should stop before
197222 Kubernetes cleanup does
198- - optionally ` READY_URLS ` , if Kubernetes should gate researcher startup on
199- external service health
200223
201224The shared wrapper handles logs, diffs, metrics, status, and UI manifests.
202225Recipe code should focus on running the benchmark or training loop and emitting
@@ -222,7 +245,8 @@ To also clear shared run data:
222245
223246``` bash
224247DELETE_ARTIFACTS=1 \
225- LOG_ROOT=/mnt/shared/open-rl/autoresearch/text_sql \
248+ LOG_ROOT=/mnt/shared/open-rl/autoresearch \
249+ RUN_NAME=text-sql \
226250OVERLAY=examples/autoresearch/recipes/text_sql \
227251 examples/autoresearch/cleanup_research_session.sh
228252```
0 commit comments