Skip to content

Releases: NovaSky-AI/SkyRL

SkyRL-Train: v0.4.0

13 Feb 17:48
332c7cb

Choose a tag to compare

Highlights

Tinker API Integration: SkyRL now fully implements the Tinker API, a simple training and sampling API introduced by Thinking Machines Lab. Any training script written against the Tinker API can run locally on your own GPUs using SkyRL's backends with zero code changes. See the Tinker API docs to get started.

Supported Tinker features include:

  • Supervised fine-tuning (cross_entropy loss) and RL training (importance_sampling loss)
  • LoRA and full-parameter fine-tuning
  • Sampling with logprobs via colocated vLLM inference engines
  • FSDP2 and Megatron training backends
  • Lazy inference engine initialization for SFT-only workloads
  • Ephemeral and persistent weight sync modes

Repo Reorganization: The skyrl-tx and skyrl-train packages are being unified into a single skyrl/ folder. The existing packages remain fully functional and will be migrated to new paths shortly.

Megatron Backend for Tinker: The Megatron strategy is now fully supported for Tinker workloads, including RL training with loss_fn_outputs passthrough.

HTTP Inference Integration: A new HTTP-based inference server integration (feature-flagged) enables decoupled inference engine deployments.

Pythonic Configs: Introduced configuration dataclasses as an alternative to YAML-only configuration, with migration of tests to the new system.

Off-Policy Correction Refactor: Refactored truncated importance sampling (TIS) into a more comprehensive off-policy correction config with support for token-level and sequence-level ratio types.

Harbor Integration: Upstream Harbor integration for evaluation, with Modal support and configurable rate limiting.

Documentation: Migrated documentation to fumadocs, with comprehensive Tinker API docs including quickstart, architecture, cookbook scripts, and configuration pages.

New Model Support (TX):

  • DeepSeekV3 implementation with expert parallelism
  • GLM-4.7 Flash support
  • Qwen3 stacked weights optimization

What's Changed

  • [tx] Add experimental SkyRL-train backend that supports SFT by @pcmoritz in #871
  • Add sampling support for Tinker SkyRL backend by @pcmoritz in #999
  • Add checkpointing support for Tinker SkyRL backend by @pcmoritz in #992
  • Unify Megatron and FSDP training interfaces with forward_backward + optim_step by @pcmoritz in #901
  • Implement forward-only pass and populate metrics by @tyler-griggs in #1046
  • Emit loss_fn_outputs with logprobs for RL losses in forward_backward by @tyler-griggs in #1047
  • [tx] Lazy inference engine initialization by @tyler-griggs in #1069
  • Support colocate_all=False in Tinker backend by @tyler-griggs in #1097
  • [skyrl-train] Return loss_fn_outputs for megatron backend to support tinker RL by @erictang000 in #1102
  • [tx][megatron] making megatron skyrl-train worker usable as TX backend by @erictang000 in #1067
  • [tx][train][merge] make the skyrl folder standalone by @erictang000 in #1084
  • [WIP][skyrl] Create new skyrl folder combining tx + train by @erictang000 in #1068
  • [skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") by @pcmoritz in #961
  • Add set_lr() for dynamic learning rate updates from Tinker by @pcmoritz in #978
  • Fix placement group creation in SkyRL-Train backend by @pcmoritz in #1010
  • [skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N by @CharlieFRuan in #931
  • [skyrl-train][inference] Inference Server Refactor (1/N) by @CharlieFRuan in #899
  • [skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N by @CharlieFRuan in #904
  • [train] Pythonic Configs 1/N - Introduce configuration dataclasses by @CharlieFRuan in #1001
  • [skyrl-train] Refactor TIS to use more comprehensive off policy correction config by @erictang000 in #849
  • [train][Harbor][1/N] Upstream Harbor integration by @CharlieFRuan in #923
  • [Harbor] Add Modal support and bump Harbor version by @CharlieFRuan in #1022
  • [Harbor] Add rate limit for trials/sec and max concurrency by @CharlieFRuan in #1074
  • [tx] DeepseekV3 implementation by @pcmoritz in #889
  • [tx] Add support for GLM-4.7 Flash by @pcmoritz in #1023
  • [tx] Stack weights — Qwen3 by @pcmoritz in #1079
  • [tx] Add EP axis to deepseek by @pcmoritz in #993
  • [tx] chunked logprobs computation for memory efficiency by @pcmoritz in #902
  • [skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes by @erictang000 in #1000
  • [train] Enable RayPrometheusStatLogger for async vLLM engine by @CharlieFRuan in #900
  • [train][OpenAI] Add generator.served_model_name for /chat/completions by @CharlieFRuan in #970
  • [train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages by @CharlieFRuan in #981
  • [train][vllm] Add enable_log_requests and max_log_len support by @tyler-griggs in #1071
  • [tx] Use WAL mode for sqlite by @pcmoritz in #1054
  • Increase busy timeout for sqlite to avoid database is locked error by @pcmoritz in #1105
  • [tx] Gracefully handle stale save_weights_for_sampler requests on engine restart by @pcmoritz in #1073
  • Migrate documentation to fumadocs by @tyler-griggs in #941
  • Add Tinker integration documentation by @tyler-griggs in #1050
  • [agent] Add YouCom search engine by @caoshiyi in #803

Full Changelog: skyrl_train-v0.3.0...skyrl_train-v0.4.0

SkyRL-Train: v0.3.0

03 Dec 17:07

Choose a tag to compare

Highlights

Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html

Dependency Upgrades:

  • Upgraded vLLM to 0.11.0, Ray to 2.51.1
  • Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.

The updated installation instructions can be found here.

Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.

SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.

Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!

What's Changed

Read more

SkyRL-Train: v0.2.0

13 Oct 18:12
1ed499c

Choose a tag to compare

Highlights

This release contains 163 commits from 22 contributors, including 11 new contributors!

Megatron Backend: SkyRL now has full support for the Megatron training backend with 5D parallelism and strong support for large-scale MoE training. Learn more in our Megatron guide and examples.

LoRA Support: SkyRL now supports LoRA training with the FSDP backend and vLLM inference engine. Learn more in our LoRA guide and examples. We will continue aggressively improving LoRA support and performance, tracked in #449.

OpenAI API Compatibility: SkyRL has standardized around the OpenAI API for inference. This means that agents and agent scaffolds can call into the inference engine over the OpenAI API. SkyRL manages the inference engines and will provide a base_url to hit an OpenAI API compatible endpoint.

Integrations: Building on top of our standardization on OpenAI APIs, we integrated several popular environment and agentic projects. A couple highlights include:

What's Changed

Read more

SkyRL-Train: v0.1.0

20 Aug 01:20
6c50026

Choose a tag to compare

What's Changed

Read more