- 2026-04-23
- Added a GitHub Actions CI workflow for Rust and Python tests and lints; P2P bind failures now surface as a typed
RuntimeErrorinstead of a panic.
- Prevented GPU stage starvation at drain-tail by clamping autoscaler scale-downs to keep enough workers for in-flight and queued tasks per active stage.
- Caught the autoscaler TOCTOU race in
WorkerAllocator.add_worker()as a typedAllocationErrorinstead of crashing the pipeline. - Tolerated
Nonegpustat fields on DGX Spark GB10 soNodeResourceMonitor's metrics loop keeps running on unified-memory GPUs. - Made resource-shortage errors actionable (per-stage / worker-count remediation first, BATCH hint scoped to STREAMING, mode name in prefix, CPU / GPU units on requires/available).
- Used
math.floorfor CPU-count truncation and clamped the result to>= 0to guard against misconfiguredcpu_allocation_percentage. - Moved the Ray cluster startup log to after initialization completes.
- Registered
L1andCPUpytest markers and silenced theTestS3Objectcollection warning under external CI runners.
- 2026-04-14
- Fixed the Xenna autoscaler which will return a raised Python exception on allocation failure instead of a
panic!().
- 2026-04-13
- Implemented zero-copy Bytes streaming, removed some allocations and several buffer copies
- Lower verbosity on Ray orphan reap messages
- Fixed lint and cargo-audit warnings, remove unused crates
- 2026-03-11
- Fixed leaking child processes from stage actors by snapshotting the PID tree before
ray.kill()and reaping survivors via a pinned follow-up task (opt-in viaXENNA_KILL_ACTOR_SURVIVORS=1). - Fixed O(n²) process tree construction in
ProcessTree.makeand hardened it against missing/invalid psutil fields.
- 2026-03-03
- Added support for OpenTelemetry distributed tracing via optional
XENNA_RAY_TRACING_HOOKduring Ray initialization. - Added configurable S3/object-store retry settings in
ObjectStoreConfig.make_for_s3(max_retries,retry_timeout,init_backoff,max_backoff). - Added autoscaling smoke tests for fragmentation and large-model allocation scenarios.
- Updated Xenna to use
RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1for compatibility with upstream Ray behavior. - Fixed GPU allocation source-of-truth by capping detected GPUs to Ray-reported GPU count per node.
- 2025-12-01
- Added a SERVING mode that extends STREAMING mode for online-serving use case.
- Updated Xenna to use Ray-reported CPU resources.
- 2025-10-30
- Implemented an SPMD mode, which allows users to run multi-gpu and multi-node inference similarly to torchrun.
- Implemented P2P artifact downloads. This will allow users to efficiently download artifacts before the job starts.
- Improved task status polling for better performance at large scale.
- Xenna should be much less thread hungry than it was before.
- 2025-09-25
- Fixed a bug in autoscaler in case of dynamic split.
- 2025-09-15
- Fixed a bug when autoscaler tries to allocate workers for finished stages.
- 2025-09-05
- Refactored the autoscaling code to reduce clones for better performance.
- 2025-08-27
- Implemented autoscaling algorithm in Rust for better performance and scalability.
- Added metrics for the main loop of streaming executor.
- 2025-08-19
- Add workflow to publish packages to PyPI.
- Fixed bug on queue-size stats when back-pressure kicking in.
- Fixed a possible hang when having a fan-in stage with large stage_batch_size.
- 2025-08-14
- Add
over_provision_factortoStageSpecto influence stage worker allocation by autoscaler. - Allow
StageSpec.num_workers_per_nodeto befloatfor greater flexibility. - Add support to respect
CUDA_VISIBLE_DEVICESif environment variableXENNA_RESPECT_CUDA_VISIBLE_DEVICESis set.
- 2025-06-11
- Initial version