[fix] Fix the multi-engine launch script by tyler-griggs · Pull Request #157 · NovaSky-AI/SkyRL

tyler-griggs · 2025-08-18T18:06:49Z

What does this PR do?

Fixes broken multi-inference-engine launch script (there was a simple bug of missing a config param)
Refactor initialize_ray to separate out env preparation (e.g., determining envvars).

Testing

Ran launch_multiple_remote_servers.py with 4 inference engines.

gemini-code-assist

Code Review

This pull request successfully fixes the multi-engine launch script by adding a missing configuration parameter and refactoring the Ray initialization logic. The refactoring separates environment preparation into its own function, which improves code clarity and robustness, especially in how it determines the maximum number of GPUs per node. The changes look good, but I have one suggestion to improve consistency in how timeouts are handled.

gemini-code-assist · 2025-08-18T18:07:32Z

skyrl-train/scripts/launch_multiple_remote_servers.py

    placement_groups = [placement_group(bundle) for bundle in bundles]
    for pg in placement_groups:
+        print(f"Waiting for Ray placement group to be ready...")
        get_ray_pg_ready_with_timeout(pg, timeout=180)


The timeout for waiting for the Ray placement group is hardcoded to 180. It would be more consistent and flexible to use the value from the --timeout command-line argument, which is stored in args.timeout. This allows users to control all related timeouts with a single parameter.

Suggested change

get_ray_pg_ready_with_timeout(pg, timeout=180)

get_ray_pg_ready_with_timeout(pg, timeout=args.timeout)

refac ray init, fix server launch

dd15a40

tyler-griggs changed the title ~~[fix] Multi-engine launch script~~ [fix] Fix the multi-engine launch script Aug 18, 2025

tyler-griggs marked this pull request as ready for review August 18, 2025 18:07

gemini-code-assist bot reviewed Aug 18, 2025

View reviewed changes

tyler-griggs merged commit dd15a40 into NovaSky-AI:main Aug 18, 2025
2 of 3 checks passed

tyler-griggs mentioned this pull request Aug 18, 2025

[fix] Formatting fixes to silence linter #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Fix the multi-engine launch script#157

[fix] Fix the multi-engine launch script#157
tyler-griggs merged 1 commit intoNovaSky-AI:mainfrom
tyler-griggs:main

tyler-griggs commented Aug 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	get_ray_pg_ready_with_timeout(pg, timeout=180)
	get_ray_pg_ready_with_timeout(pg, timeout=args.timeout)

Conversation

tyler-griggs commented Aug 18, 2025

What does this PR do?

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant