Skip to content

Commit 7abe0e6

Browse files
committed
Simplify autoresearch run artifacts
1 parent c9c7812 commit 7abe0e6

27 files changed

Lines changed: 2452 additions & 2509 deletions

examples/autoresearch/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
1212
git \
1313
nodejs \
1414
npm \
15+
procps \
1516
&& rm -rf /var/lib/apt/lists/*
1617

1718
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uv/bin/uv

examples/autoresearch/README.md

Lines changed: 47 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ The command can be any runnable benchmark or training loop. It just needs to:
4141
- exit nonzero on failure
4242
- log the configured metric to `run_dir/metrics.jsonl`
4343

44+
Useful command placeholders are `{run_dir}` for the attempt artifact directory,
45+
`{attempt_name}` for numeric attempt ids like `001`, and `{run_root}` for the
46+
assembled `LOG_ROOT/RUN_NAME` directory.
47+
4448
```python
4549
ml_logger.log_metrics({"accuracy": 0.73}, step=1)
4650
```
@@ -67,27 +71,40 @@ recipe-specific settings.
6771
The easiest Kubernetes path is the small CLI. From `examples/autoresearch`:
6872

6973
```bash
70-
uv run --project .. python -m harness.cli run recipes/text_sql name=text-sql-v1
74+
uv run python -m harness.cli recipes/text_sql run_name=alpha
7175
```
7276

73-
That command creates an ignored generated overlay under `.runs/text-sql-v1`,
77+
That command creates an ignored generated overlay under `.runs/text-sql-alpha`,
7478
copies the flat recipe directory into a ConfigMap, mounts it into the stable
75-
researcher image, sets `RECIPE`, `LOG_ROOT`, and `SPEC_HASH`, and runs
76-
`kubectl apply -k`.
79+
researcher image, sets `RECIPE`, `LOG_ROOT`, `RUN_NAME`, and `SPEC_HASH`, and
80+
runs `kubectl apply -k`. `run_name` becomes the researcher id; the shared
81+
session name defaults to the recipe directory.
82+
83+
Launch a second researcher into the same UI session by changing only
84+
`run_name`:
85+
86+
```bash
87+
uv run python -m harness.cli recipes/text_sql run_name=beta
88+
```
89+
90+
The CLI keeps the shared session name as the recipe name (`text-sql` here) and
91+
uses `run_name` for the researcher id. Both researchers write under the same
92+
`LOG_ROOT/text-sql` directory, so the UI shows both in the left rail.
7793

7894
Preview without applying:
7995

8096
```bash
81-
uv run --project .. python -m harness.cli run recipes/text_sql \
82-
name=text-sql-v1 \
97+
uv run python -m harness.cli recipes/text_sql \
98+
run_name=alpha \
8399
apply=False
84100
```
85101

86102
Pass common recipe env directly:
87103

88104
```bash
89-
uv run --project .. python -m harness.cli run recipes/my_recipe \
90-
name=my-recipe-v1 \
105+
uv run python -m harness.cli recipes/my_recipe \
106+
run_name=alpha \
107+
session_name=my-recipe-v1 \
91108
tinker_base_url=http://open-rl-gateway-service:8000 \
92109
base_model=google/gemma-4-e2b
93110
```
@@ -107,6 +124,14 @@ and calls the shared OpenRL/Tinker services.
107124

108125
## Cluster Run
109126

127+
These manifests require the official Agent Sandbox CRD. The researcher resource
128+
kind is `agents.x-k8s.io/v1alpha1/Sandbox`; there is no plain Kubernetes `Job`
129+
fallback in this demo. Verify the CRD before applying a recipe:
130+
131+
```bash
132+
kubectl api-resources | grep -i sandbox
133+
```
134+
110135
Create the API secret for agent-backed researcher pods:
111136

112137
```bash
@@ -147,16 +172,15 @@ Use the normal [GKE setup guide](../../docs/setup/gke-setup.md) for cluster,
147172
GPU, storage, and the OpenRL backend. These overlays add researcher sandboxes and
148173
the UI on top of that shared backend.
149174

150-
Researcher pods use an init container to wait for comma-separated `READY_URLS`,
151-
so early pod startup does not race vLLM, the trainer worker, or the gateway. The
152-
agent starts only after those endpoints are reachable.
175+
Researcher pods use a shared init container to wait for vLLM, the trainer
176+
worker, and the gateway before the agent starts.
153177

154178
## Shared Pieces
155179

156180
```text
157181
harness/cli.py # creates/applies a generated overlay for a recipe dir
158182
harness/agent.py # prepares git, records baseline, launches Gemini
159-
harness/attempt.py # runs one measured attempt and writes attempt.json
183+
harness/attempt.py # runs one measured attempt and writes metadata.json
160184
harness/serve.py # read-only UI server over researcher/attempt manifests
161185
harness/utils.py # shared JSON, git, hashing, process helpers
162186
k8s/base/ # reusable Sandbox/UI resources
@@ -171,13 +195,14 @@ workspace at `RECIPE`'s parent and committed as the run baseline. That lets the
171195
image stay stable while recipe files come from shared storage.
172196

173197
`harness.attempt` runs recipe code and writes artifacts. The UI reads
174-
`LOG_ROOT/researchers/*/researcher.json`,
175-
`LOG_ROOT/researchers/*/attempts/*/attempt.json`, and fixed artifact filenames
176-
next to those manifests. Clearing `LOG_ROOT` resets the visible run.
177-
178-
The launcher records the unmodified default config as `000-baseline`, then
179-
passes the recipe-adjacent `program.md` to Gemini as the prompt. That program
180-
tells the agent to edit only the declared target, commit the attempt, run
198+
`LOG_ROOT/RUN_NAME/researchers/*/metadata.json`,
199+
`LOG_ROOT/RUN_NAME/researchers/*/attempts/*/metadata.json`, and fixed artifact
200+
filenames next to those manifests. Clearing `LOG_ROOT/RUN_NAME` resets the
201+
visible run.
202+
203+
The launcher records the unmodified default config as attempt `000`, then passes
204+
the recipe-adjacent `program.md` to Gemini as the prompt. That program tells the
205+
agent to edit only the declared target, commit the attempt, run
181206
`eval "${RUN_ATTEMPT_COMMAND}"`, record the metric, and reset if the metric did
182207
not improve.
183208

@@ -189,14 +214,12 @@ Copy one existing recipe directory and update:
189214
- `autoresearch.toml`
190215
- the command target, if you keep one
191216
- the editable target
192-
- `kustomization.yaml` settings: `RECIPE`, `LOG_ROOT`, and
217+
- `kustomization.yaml` settings: `RECIPE`, `LOG_ROOT`, `RUN_NAME`, and
193218
`ATTEMPT_TIMEOUT_MINUTES`
194219
- optionally `RECIPE_DIR`, if Kubernetes should use a recipe uploaded to shared
195220
storage instead of the recipe already in the image
196221
- optionally `AGENT_TIMEOUT_MINUTES`, if the researcher pod should stop before
197222
Kubernetes cleanup does
198-
- optionally `READY_URLS`, if Kubernetes should gate researcher startup on
199-
external service health
200223

201224
The shared wrapper handles logs, diffs, metrics, status, and UI manifests.
202225
Recipe code should focus on running the benchmark or training loop and emitting
@@ -222,7 +245,8 @@ To also clear shared run data:
222245

223246
```bash
224247
DELETE_ARTIFACTS=1 \
225-
LOG_ROOT=/mnt/shared/open-rl/autoresearch/text_sql \
248+
LOG_ROOT=/mnt/shared/open-rl/autoresearch \
249+
RUN_NAME=text-sql \
226250
OVERLAY=examples/autoresearch/recipes/text_sql \
227251
examples/autoresearch/cleanup_research_session.sh
228252
```

examples/autoresearch/cleanup_research_session.sh

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,14 @@ OVERLAY="${OVERLAY:-examples/autoresearch/recipes/text_sql}"
55
NAMESPACE="${NAMESPACE:-default}"
66
DELETE_ARTIFACTS="${DELETE_ARTIFACTS:-0}"
77
LOG_ROOT="${LOG_ROOT:-}"
8+
RUN_NAME="${RUN_NAME:-}"
89

910
kubectl -n "${NAMESPACE}" delete -k "${OVERLAY}" --ignore-not-found=true
1011

1112
if [ "${DELETE_ARTIFACTS}" = "1" ]; then
12-
if [ -z "${LOG_ROOT}" ]; then
13-
echo "DELETE_ARTIFACTS=1 requires LOG_ROOT" >&2
13+
if [ -z "${LOG_ROOT}" ] || [ -z "${RUN_NAME}" ]; then
14+
echo "DELETE_ARTIFACTS=1 requires LOG_ROOT and RUN_NAME" >&2
1415
exit 2
1516
fi
16-
rm -rf "${LOG_ROOT}"
17+
rm -rf "${LOG_ROOT%/}/${RUN_NAME}"
1718
fi

0 commit comments

Comments
 (0)