Skip to content

Latest commit

 

History

History
184 lines (101 loc) · 4.34 KB

File metadata and controls

184 lines (101 loc) · 4.34 KB

Changelog

Latest

[0.3.0]

Released

  • 2026-04-23

Added

  • Added a GitHub Actions CI workflow for Rust and Python tests and lints; P2P bind failures now surface as a typed RuntimeError instead of a panic.

Fixed

  • Prevented GPU stage starvation at drain-tail by clamping autoscaler scale-downs to keep enough workers for in-flight and queued tasks per active stage.
  • Caught the autoscaler TOCTOU race in WorkerAllocator.add_worker() as a typed AllocationError instead of crashing the pipeline.
  • Tolerated None gpustat fields on DGX Spark GB10 so NodeResourceMonitor's metrics loop keeps running on unified-memory GPUs.
  • Made resource-shortage errors actionable (per-stage / worker-count remediation first, BATCH hint scoped to STREAMING, mode name in prefix, CPU / GPU units on requires/available).
  • Used math.floor for CPU-count truncation and clamped the result to >= 0 to guard against misconfigured cpu_allocation_percentage.
  • Moved the Ray cluster startup log to after initialization completes.

Changed

  • Registered L1 and CPU pytest markers and silenced the TestS3Object collection warning under external CI runners.

[0.2.3]

Released

  • 2026-04-14

Fixed

  • Fixed the Xenna autoscaler which will return a raised Python exception on allocation failure instead of a panic!().

[0.2.2]

Released

  • 2026-04-13

Fixed

  • Implemented zero-copy Bytes streaming, removed some allocations and several buffer copies
  • Lower verbosity on Ray orphan reap messages
  • Fixed lint and cargo-audit warnings, remove unused crates

[0.2.1]

Released

  • 2026-03-11

Fixed

  • Fixed leaking child processes from stage actors by snapshotting the PID tree before ray.kill() and reaping survivors via a pinned follow-up task (opt-in via XENNA_KILL_ACTOR_SURVIVORS=1).
  • Fixed O(n²) process tree construction in ProcessTree.make and hardened it against missing/invalid psutil fields.

[0.2.0]

Released

  • 2026-03-03

Added

  • Added support for OpenTelemetry distributed tracing via optional XENNA_RAY_TRACING_HOOK during Ray initialization.
  • Added configurable S3/object-store retry settings in ObjectStoreConfig.make_for_s3 (max_retries, retry_timeout, init_backoff, max_backoff).
  • Added autoscaling smoke tests for fragmentation and large-model allocation scenarios.

Fixed

  • Updated Xenna to use RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1 for compatibility with upstream Ray behavior.
  • Fixed GPU allocation source-of-truth by capping detected GPUs to Ray-reported GPU count per node.

[0.1.8]

Released

  • 2025-12-01

Added

  • Added a SERVING mode that extends STREAMING mode for online-serving use case.

Fixed

  • Updated Xenna to use Ray-reported CPU resources.

[0.1.7]

Released

  • 2025-10-30

Added

  • Implemented an SPMD mode, which allows users to run multi-gpu and multi-node inference similarly to torchrun.
  • Implemented P2P artifact downloads. This will allow users to efficiently download artifacts before the job starts.
  • Improved task status polling for better performance at large scale.

Fixed

  • Xenna should be much less thread hungry than it was before.

[0.1.6]

Released

  • 2025-09-25

Fixed

  • Fixed a bug in autoscaler in case of dynamic split.

[0.1.5]

Released

  • 2025-09-15

Fixed

  • Fixed a bug when autoscaler tries to allocate workers for finished stages.

[0.1.4]

Released

  • 2025-09-05

Added

  • Refactored the autoscaling code to reduce clones for better performance.

[0.1.3]

Released

  • 2025-08-27

Added

  • Implemented autoscaling algorithm in Rust for better performance and scalability.
  • Added metrics for the main loop of streaming executor.

[0.1.2]

Released

  • 2025-08-19

Added

  • Add workflow to publish packages to PyPI.

Fixed

  • Fixed bug on queue-size stats when back-pressure kicking in.
  • Fixed a possible hang when having a fan-in stage with large stage_batch_size.

[0.1.1]

Released

  • 2025-08-14

Added

  • Add over_provision_factor to StageSpec to influence stage worker allocation by autoscaler.
  • Allow StageSpec.num_workers_per_node to be float for greater flexibility.
  • Add support to respect CUDA_VISIBLE_DEVICES if environment variable XENNA_RESPECT_CUDA_VISIBLE_DEVICES is set.

[0.1.0]

Released

  • 2025-06-11

Added

  • Initial version