[doc] feat: add Claude Code skills for add-dataset, add-reward, add-trainer#5844
[doc] feat: add Claude Code skills for add-dataset, add-reward, add-trainer#5844khazic wants to merge 2 commits intoverl-project:mainfrom
Conversation
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces three new skill guides for the veRL framework: adding datasets, adding reward functions, and adding trainers. These guides provide step-by-step instructions, schema requirements, and reference implementations for developers. The review feedback identifies several critical technical inaccuracies in the documentation that would lead to runtime errors or incorrect implementations if followed. Specifically, the feedback corrects a class name in the dataset guide, an incomplete function signature in the reward function template, and misleading instructions regarding how to override advantage computation and execute custom trainers.
.agents/skills/add-dataset/SKILL.md
Outdated
|
|
||
| 1. **Preprocessing script** (`examples/data_preprocess/<name>.py`) — run once offline to | ||
| convert raw data into parquet files with a fixed schema | ||
| 2. **`RLDataset`** (`verl/utils/dataset/rl_dataset.py`) — runtime dataset class that |
There was a problem hiding this comment.
The class name in verl/utils/dataset/rl_dataset.py is RLHFDataset, not RLDataset. Referring to it as RLDataset in the skill instructions will likely cause the AI to generate incorrect import statements or class references when implementing new datasets or preprocessing scripts.
| 2. **`RLDataset`** (`verl/utils/dataset/rl_dataset.py`) — runtime dataset class that | |
| 2. **`RLHFDataset`** (`verl/utils/dataset/rl_dataset.py`) — runtime dataset class that |
.agents/skills/add-reward/SKILL.md
Outdated
| ```python | ||
| from verl.utils.reward_score.<name> import compute_score as <name>_compute_score | ||
|
|
||
| def default_compute_score(data_source, solution_str, ground_truth, extra_info=None): |
There was a problem hiding this comment.
The signature for default_compute_score provided in the template is incomplete. The actual implementation in verl/utils/reward_score/__init__.py includes several additional parameters (like sandbox_fusion_url, concurrent_semaphore, etc.) and a **kwargs catch-all. If the AI follows this template to replace the function header, it will cause a TypeError at runtime when the function is called with the full set of arguments by the RewardManager.
| def default_compute_score(data_source, solution_str, ground_truth, extra_info=None): | |
| def default_compute_score(data_source, solution_str, ground_truth, extra_info=None, **kwargs): |
.agents/skills/add-trainer/SKILL.md
Outdated
| def _compute_advantage(self, data: DataProto) -> DataProto: | ||
| """Override advantage computation for your algorithm.""" | ||
| rewards = data.batch["token_level_scores"] # shape: [bs, seqlen] | ||
| # ... your advantage computation | ||
| data.batch["advantages"] = advantages | ||
| data.batch["returns"] = returns | ||
| return data |
There was a problem hiding this comment.
The RayPPOTrainer class in verl/trainer/ppo/ray_trainer.py does not have a _compute_advantage method. Advantage computation is handled by a standalone compute_advantage function called within the fit method. Overriding _compute_advantage in a subclass as suggested here will have no effect on the training loop. To customize advantage estimation, the recommended approach is to register a new estimator function as described in Step 6 of this guide.
.agents/skills/add-trainer/SKILL.md
Outdated
|
|
||
| set -x | ||
|
|
||
| python3 -m verl.trainer.main_ppo \ |
There was a problem hiding this comment.
The run script example uses python3 -m verl.trainer.main_ppo, which defaults to using the standard RayPPOTrainer implementation. If a user implements a custom trainer class like MyTrainer (as suggested in Step 3), this command will not execute their custom logic. The guide should instead show how to create a custom entry point that instantiates and runs the new trainer class, or how to configure the system to use the custom class.
…ner extension points - add-dataset: RLDataset → RLHFDataset (actual class name in rl_dataset.py) - add-reward: add **kwargs to default_compute_score template signature - add-trainer: replace nonexistent _compute_advantage override with register_adv_est pattern - add-trainer: fix run script entry point — import custom module before calling main_ppo Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
SummaryThis PR adds three Claude Code skills that guide AI assistants through the most common veRL contribution patterns. What each skill does:
Improvements in the latest commit (based on Gemini review):
|
What does this PR do?
Adds three Claude Code skills for the most common veRL contribution patterns. Split out from #5843 per review from @tongyx361.
Each file under
.agents/skills/is read by Claude Code when the user invokes the corresponding slash command. No runtime behavior is affected.add-dataset/add-datasetadd-reward/add-rewardcompute_scorereward functionadd-trainer/add-trainerUsage examples
/add-dataset— adding AQuA-RATPrompt:
/add-datasetI want to add theopenai/aqua_ratmultiple-choice math dataset.Claude generates
examples/data_preprocess/aqua_rat.py:The skill correctly applied the required schema (
data_source,prompt,reward_model.ground_truth) and matched the dataset field names./add-reward— multiple-choice answer extractionPrompt:
/add-rewardforopenai/aqua_rat— extract the chosen option letter (A–E).Claude generates
verl/utils/reward_score/aqua_rat.py:And registers it in
verl/utils/reward_score/__init__.py:The skill correctly followed the no-exceptions, return-float contract and matched the
data_sourcekey from the preprocessing step./add-trainer— GRPO with clipped advantagesPrompt:
/add-trainerI want a GRPO variant that clips advantages to[-clip, clip]before the policy update.Claude generates
examples/grpo_clip_trainer/grpo_clip_trainer.py:The skill correctly identified
register_adv_estas the extension point and showed how to wire the config key through.Checklist
.agents/skills/files onlyRelated
add-unit-tests,review-pr,create-pr,commit-conventions,debug-distributed, upgrade skills