Skip to content

[fix] Fix the multi-engine launch script#157

Merged
tyler-griggs merged 1 commit intoNovaSky-AI:mainfrom
tyler-griggs:main
Aug 18, 2025
Merged

[fix] Fix the multi-engine launch script#157
tyler-griggs merged 1 commit intoNovaSky-AI:mainfrom
tyler-griggs:main

Conversation

@tyler-griggs
Copy link
Member

What does this PR do?

  • Fixes broken multi-inference-engine launch script (there was a simple bug of missing a config param)
  • Refactor initialize_ray to separate out env preparation (e.g., determining envvars).

Testing

  • Ran launch_multiple_remote_servers.py with 4 inference engines.

@tyler-griggs tyler-griggs changed the title [fix] Multi-engine launch script [fix] Fix the multi-engine launch script Aug 18, 2025
@tyler-griggs tyler-griggs marked this pull request as ready for review August 18, 2025 18:07
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully fixes the multi-engine launch script by adding a missing configuration parameter and refactoring the Ray initialization logic. The refactoring separates environment preparation into its own function, which improves code clarity and robustness, especially in how it determines the maximum number of GPUs per node. The changes look good, but I have one suggestion to improve consistency in how timeouts are handled.

placement_groups = [placement_group(bundle) for bundle in bundles]
for pg in placement_groups:
print(f"Waiting for Ray placement group to be ready...")
get_ray_pg_ready_with_timeout(pg, timeout=180)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timeout for waiting for the Ray placement group is hardcoded to 180. It would be more consistent and flexible to use the value from the --timeout command-line argument, which is stored in args.timeout. This allows users to control all related timeouts with a single parameter.

Suggested change
get_ray_pg_ready_with_timeout(pg, timeout=180)
get_ray_pg_ready_with_timeout(pg, timeout=args.timeout)

@tyler-griggs tyler-griggs merged commit dd15a40 into NovaSky-AI:main Aug 18, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant