Skip to content

Latest commit

 

History

History
153 lines (113 loc) · 4.76 KB

File metadata and controls

153 lines (113 loc) · 4.76 KB

ray-ascend

Ray Ascend Plugin

| About Ascend | Documentation |

Overview

ray-ascend is a community-maintained hardware plugin that supports advanced Ray features on Ascend NPU accelerators.

By default, Ray natively supports Ascend NPU as a predefined resource type for binding actors and tasks (see Ray Accelerator Support). As an enhancement, ray-ascend provides Ascend-native features on Ray, such as collective communication via Huawei Collective Communication Library (HCCL), Ray Direct Transport (RDT), and more.

For performance benchmarks, see the Performance Benchmark Report.

Prerequisites

  • Architecture: aarch64, x86
  • OS Kernel: Linux
  • Python Dependencies:
    • python >= 3.10, <= 3.11
    • CANN >= 8.2.rc1
    • torch >= 2.7.1; torch-npu >= 2.7.1.post2
    • torch and torch-npu versions must be compatible with each other.
    • ray >= 2.55.0

Quick Start

Installation

pip install "ray-ascend[yr]"

HCCL Collective Communication Among Ray Actors

import ray
from ray.util import collective
from ray_ascend import register_hccl_collective_backend

register_hccl_collective_backend()

@ray.remote(resources={"NPU": 1})
class RayActor:
    def __init__(self):
        register_hccl_collective_backend()

collective.create_collective_group(
    actors,
    len(actors),
    list(range(0, len(actors))),
    backend="HCCL",
    group_name="my_group",
)

# Each actor broadcasts in SPMD manner
collective.broadcast(tensor, src_rank=0, group_name="my_group")

Transport Ascend NPU Tensors via HCCS

import ray
import torch
from ray.util.collective import create_collective_group
from ray_ascend import register_hccl_tensor_transport

register_hccl_tensor_transport()

@ray.remote(resources={"NPU": 1})
class RayActor:
    def __init__(self):
        register_hccl_tensor_transport()

    @ray.method(tensor_transport="HCCL")
    def random_tensor(self):
        return torch.zeros(1024, device="npu")

    def sum(self, tensor: torch.Tensor):
        return torch.sum(tensor)


sender, receiver = RayActor.remote(), RayActor.remote()
group = create_collective_group([sender, receiver], backend="HCCL")

tensor = sender.random_tensor.remote()
result = receiver.sum.remote(tensor)
ray.get(result)

Transport Ascend NPU Tensors via HCCS and CPU Tensors via RDMA

OpenYuanrong DataSystem (YR) allows users to transport NPU tensors (via HCCS) and CPU tensors (via RDMA if provided) using Ray objects.

import ray
from ray_ascend import register_yr_tensor_transport

register_yr_tensor_transport(["npu", "cpu"])

@ray.remote(resources={"NPU": 1})
class RayActor:
    def __init__(self):
        register_yr_tensor_transport(["npu", "cpu"])

    @ray.method(tensor_transport="YR")
    def transfer_npu_tensor_via_hccs(self):
        return torch.zeros(1024, device="npu")

    @ray.method(tensor_transport="YR")
    def transfer_cpu_tensor_via_rdma(self):
        return torch.zeros(1024)

sender = RayActor.remote()
npu_tensor = ray.get(sender.transfer_npu_tensor_via_hccs.remote())
cpu_tensor = ray.get(sender.transfer_cpu_tensor_via_rdma.remote())

Ray Version Compatibility

Ray Version YR Transport HCCL Collective HCCL Tensor Transport (RDT)
>=2.55, <2.56
>= 2.56

Contributing

See CONTRIBUTING and developer guide for more details—a step-by-step guide to help you set up your development environment, build, and test. Please let us know if you find a bug or request a feature by filing an issue.

License

Apache License 2.0. See LICENSE file.