[Gym] Add AIME by lynnliu030 · Pull Request #148 · NovaSky-AI/SkyRL

lynnliu030 · 2025-08-14T09:05:58Z

Add AIME evaluation into the SkyGym environment

gemini-code-assist

Summary of Changes

Hello @lynnliu030, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the American Invitational Mathematics Examination (AIME) as a new environment within the skyrl-gym framework. This integration enables the evaluation and training of language models on complex mathematical reasoning tasks, providing a dedicated setup for benchmarking performance in this domain.

Highlights

New AIME Environment: A new AIMEEnv class is added, inheriting from BaseTextEnv, designed for math execution tasks.
AIME Utility Functions: A utils.py file is introduced with functions to extract, normalize, and verify mathematical answers, including handling LaTeX boxed expressions.
Environment Registration: The new AIMEEnv is registered in skyrl_gym/envs/init.py to make it discoverable.
DAPO Data Preparation Script: A new shell script prepare_dapo_data.sh is added to download AIME and DAPO-Math-17k datasets.
DAPO Training Script for AIME: A new shell script run_dapo_aime.sh is added, configuring and running DAPO training and generation specifically for the AIME environment using a Qwen2.5-1.5B-Instruct model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new AIME environment for mathematical reasoning tasks, including utility functions for scoring and example scripts for training. The core logic is sound, but I've identified a critical issue in the reward calculation that would cause a runtime error. Additionally, there are several inconsistencies in type hints and minor improvements for the provided shell scripts. My review includes specific suggestions to address these points and improve the overall quality and robustness of the code.

gemini-code-assist · 2025-08-14T09:07:47Z

skyrl-gym/skyrl_gym/envs/aime/env.py

+    def _get_reward(self, action: str) -> float:
+        return utils.compute_score(action, self.ground_truth)
+
+    def step(self, action: str) -> BaseTextEnvStepOutput:
+        done = True  # always done after one step
+        reward = self._get_reward(action)
+
+        # No observation in gsm8k, and no tool call
+        return BaseTextEnvStepOutput(observations=[], reward=reward, done=done, metadata={})


The _get_reward method is incorrectly implemented. It calls utils.compute_score, which returns a dictionary, but the method is type-hinted to return a float. This causes a TypeError in the step method when creating BaseTextEnvStepOutput, as the reward parameter expects a float.

Additionally, the comment on line 26 contains a copy-paste error, mentioning gsm8k instead of AIME.

To resolve this, I suggest inlining the logic into the step method, correctly extracting the reward and other metrics, and removing the now-redundant _get_reward method.

Suggested change

def _get_reward(self, action: str) -> float:

return utils.compute_score(action, self.ground_truth)

def step(self, action: str) -> BaseTextEnvStepOutput:

done = True # always done after one step

reward = self._get_reward(action)

# No observation in gsm8k, and no tool call

return BaseTextEnvStepOutput(observations=[], reward=reward, done=done, metadata={})

def step(self, action: str) -> BaseTextEnvStepOutput:

done = True # always done after one step

score_info = utils.compute_score(action, self.ground_truth)

reward = score_info["score"]

metadata = {"acc": score_info["acc"], "pred": score_info["pred"]}

# No observation in AIME, and no tool call

return BaseTextEnvStepOutput(observations=[], reward=reward, done=done, metadata={})

gemini-code-assist · 2025-08-14T09:07:48Z

skyrl-gym/skyrl_gym/envs/aime/utils.py

+) -> float:
+    """Compute the reward score for a solution.
+
+    Args:
+        solution_str: The solution string
+        ground_truth: The ground truth answer
+        strict_box_verify: Whether to use strict box verification
+        pause_tokens_index: Indices of pause tokens
+
+    Returns:
+        Reward score (1.0 for correct, -1.0 for incorrect)
+    """


The return type hint for this function is -> float and the docstring states it returns a single score, but the function actually returns a dictionary. This inconsistency is the root cause of a critical bug in AIMEEnv.

Please update the type hint and the docstring to accurately reflect that a dictionary is returned.

Suggested change

) -> float:

"""Compute the reward score for a solution.

Args:

solution_str: The solution string

ground_truth: The ground truth answer

strict_box_verify: Whether to use strict box verification

pause_tokens_index: Indices of pause tokens

Returns:

Reward score (1.0 for correct, -1.0 for incorrect)

"""

) -> dict:

"""Compute the reward score for a solution.

Args:

solution_str: The solution string

ground_truth: The ground truth answer

strict_box_verify: Whether to use strict box verification

pause_tokens_index: Indices of pause tokens

Returns:

A dictionary with keys "score", "acc", and "pred".

"""

gemini-code-assist · 2025-08-14T09:07:48Z

skyrl-gym/skyrl_gym/envs/aime/utils.py

+
+def verify(
+    solution_str: str, answer: str, strict_box_verify: bool = False, pause_tokens_index: Optional[list[int]] = None
+) -> bool:


The return type hint for this function is -> bool, but it actually returns a tuple of (bool, Optional[str]). This should be corrected for type consistency and code clarity.

Suggested change

) -> bool:

) -> tuple[bool, Optional[str]]:

skyrl-train/examples/algorithms/dapo/prepare_dapo_data.sh

gemini-code-assist · 2025-08-14T09:07:48Z

skyrl-train/examples/algorithms/dapo/run_dapo_aime.sh

@@ -0,0 +1,86 @@
+set -x


For consistency with other scripts in the project (like prepare_dapo_data.sh) and for better usability, it's recommended to add a shebang line (#!/usr/bin/env bash) at the beginning of this script. This allows it to be executed directly (e.g., ./run_dapo_aime.sh) after making it executable (chmod +x run_dapo_aime.sh).

Suggested change

set -x

#!/usr/bin/env bash

set -x

skyrl-train/examples/algorithms/dapo/run_dapo_aime.sh

SumanthRH

Looks good overall!

For a sanity check on the reward: Could we run evaluation loop before training once for a base model on AIME and makes sure the numbers look fine? maybe Qwen3-8B or DeepSeek-R1-0528-Qwen3-8B ?

skyrl-gym/skyrl_gym/envs/aime/utils.py

skyrl-gym/skyrl_gym/envs/aime/env.py

skyrl-gym/tests/test_aime.py

SumanthRH

Thanks! Would be good to remove hardcoded eos token rn, and also incorporate eos token in tests

SumanthRH · 2025-08-20T06:06:14Z

skyrl-gym/skyrl_gym/envs/aime/env.py

+
+class AIMEEnv(BaseTextEnv):
+    """
+    Environment for Math execution tasks.


Add a disclaimer on the parsing here as well please, since we use < now to detect eos token.

SumanthRH

Thanks! Before merging, it would be great to add the eval accuracy we get with the AIME env for a 8B or smaller model rn and compare with reported

Add AIME evaluation into the SkyGym environment

AIME

6a5acaa

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

lynnliu030 requested a review from SumanthRH August 14, 2025 09:06

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

x

850c351

SumanthRH self-assigned this Aug 14, 2025

SumanthRH reviewed Aug 14, 2025

View reviewed changes

skyrl-train/examples/algorithms/dapo/run_dapo_aime.sh Outdated Show resolved Hide resolved

SumanthRH reviewed Aug 14, 2025

View reviewed changes

skyrl-train/examples/algorithms/dapo/run_dapo_aime.sh Outdated Show resolved Hide resolved

SumanthRH reviewed Aug 14, 2025

View reviewed changes

match DAPO setup

6689cf3

SumanthRH reviewed Aug 14, 2025

View reviewed changes

skyrl-gym/skyrl_gym/envs/aime/utils.py Show resolved Hide resolved

lynnliu030 added 6 commits August 19, 2025 08:33

x

8c1384c

x

237bd71

fix

484da5c

x

ed54747

x

cc0128e

x

20421de

SumanthRH reviewed Aug 20, 2025

View reviewed changes

skyrl-gym/skyrl_gym/envs/aime/env.py Outdated Show resolved Hide resolved

SumanthRH reviewed Aug 20, 2025

View reviewed changes

skyrl-gym/tests/test_aime.py Show resolved Hide resolved

SumanthRH reviewed Aug 20, 2025

View reviewed changes

lynnliu030 added 3 commits August 20, 2025 05:30

x

90a36fe

test

79f5263

x

34a937b

SumanthRH reviewed Aug 20, 2025

View reviewed changes

SumanthRH approved these changes Aug 20, 2025

View reviewed changes

SumanthRH merged commit 21af78e into NovaSky-AI:main Aug 20, 2025
3 checks passed

dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026

[Gym] Add AIME (#148)

e6e324d

Add AIME evaluation into the SkyGym environment

Conversation

lynnliu030 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

SumanthRH Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lynnliu030 commented Aug 14, 2025 •

edited

Loading