Skip to content

Commit bb49172

Browse files
committed
Update tensor docs for horus_types refactor
- tensor-messages.mdx: Update to Device::cpu()/cuda(N) API, 232-byte descriptor, add TensorDtype helpers, auto-managed pools section, tensor domain types (TensorImage, TensorPointCloud, TensorDepthImage) - tensor-pool.mdx: Add auto-managed pools as recommended approach, update HorusTensor struct with embedded Device, update Device API - message-types.mdx: Add Tensor Domain Types section with TensorImage, TensorPointCloud, TensorDepthImage docs and comparison table - architecture.mdx: Update TensorPool example to auto-managed pool API - gpu-tensor-sharing.mdx: Add tensor-messages to See Also links
1 parent f5657b2 commit bb49172

5 files changed

Lines changed: 297 additions & 37 deletions

File tree

content/docs/advanced/gpu-tensor-sharing.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -602,6 +602,7 @@ cuda_ffi::host_unregister(buffer.as_ptr() as *mut _)?;
602602

603603
## See Also
604604

605-
- [TensorPool API](/rust/api/tensor-pool) - CPU tensor management
605+
- [TensorPool API](/rust/api/tensor-pool) - Pool management, auto-managed pools, and configuration
606+
- [Tensor Messages](/rust/api/tensor-messages) - HorusTensor, Device, TensorDtype, and domain types
606607
- [Performance Benchmarks](/performance/benchmarks) - Latency measurements
607608
- [Python Bindings](/python/api/python-bindings) - Python bindings

content/docs/concepts/architecture.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -241,16 +241,16 @@ The image data is written **once** to shared memory. Each subscriber reads direc
241241
TensorPool manages shared memory allocation:
242242

243243
```rust
244-
// Allocate space for a 1080p RGB image
245-
let pool = TensorPool::new(1, TensorPoolConfig::default())?;
246-
let tensor = pool.alloc(&[1080, 1920, 3], TensorDtype::U8, TensorDevice::Cpu)?;
244+
// Auto-managed pool via Topic<HorusTensor>
245+
let topic: Topic<HorusTensor> = Topic::new("camera/rgb")?;
246+
let handle = topic.alloc_tensor(&[1080, 1920, 3], TensorDtype::U8, Device::cpu())?;
247247

248248
// Write data (only done once)
249-
let data = pool.data_slice_mut(&tensor);
249+
let data = handle.data_slice_mut();
250250
camera.capture_into(data);
251251

252-
// Send through Topic - only the descriptor is copied, not the image
253-
image_pub.send(tensor);
252+
// Send through Topic - only the 232-byte descriptor is copied, not the image
253+
topic.send_handle(&handle);
254254
```
255255

256256
**TensorPool characteristics:**

content/docs/concepts/message-types.mdx

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,66 @@ Methods: `depth_from_disparity()`, `disparity_from_depth()`
603603

604604
---
605605

606+
## Tensor Domain Types (Zero-Copy)
607+
608+
For high-throughput pipelines (1080p @ 30fps, ML inference), HORUS provides tensor-backed message types that use zero-copy shared memory. These are Pod newtypes around `HorusTensor` — only the 232-byte descriptor flows through the ring buffer while the actual data stays in a shared-memory `TensorPool`.
609+
610+
All tensor types live in the `horus_types` crate.
611+
612+
### TensorImage
613+
614+
Zero-copy camera image with shape `[height, width, channels]`:
615+
616+
```rust
617+
use horus::prelude::*;
618+
619+
let topic: Topic<HorusTensor> = Topic::new("camera/rgb")?;
620+
let handle = topic.alloc_tensor(&[1080, 1920, 3], TensorDtype::U8, Device::cpu())?;
621+
// ... fill pixels via handle.data_slice_mut() ...
622+
topic.send_handle(&handle);
623+
624+
// Receiver wraps in TensorImage for domain-specific accessors
625+
if let Some(handle) = topic.recv_handle() {
626+
let img = TensorImage::from_tensor(*handle.tensor());
627+
println!("{}x{}, ch={}", img.width(), img.height(), img.channels());
628+
println!("Encoding: {:?}", img.inferred_encoding()); // Rgb8, Mono8, etc.
629+
}
630+
```
631+
632+
### TensorPointCloud
633+
634+
Zero-copy point cloud with shape `[N, K]` (K = fields per point):
635+
636+
```rust
637+
let cloud = TensorPointCloud::from_tensor(tensor);
638+
println!("{} points", cloud.point_count());
639+
if cloud.is_xyz() { /* 3 fields: XYZ */ }
640+
if cloud.has_intensity() { /* 4+ fields: XYZI */ }
641+
if cloud.has_color() { /* 6+ fields: XYZRGB */ }
642+
```
643+
644+
### TensorDepthImage
645+
646+
Zero-copy depth image with shape `[height, width]`:
647+
648+
```rust
649+
let depth = TensorDepthImage::from_tensor(tensor);
650+
if depth.is_meters() { /* F32 dtype, depth in meters */ }
651+
if depth.is_millimeters() { /* U16 dtype, depth in mm */ }
652+
```
653+
654+
### When to Use Tensor vs Standard Types
655+
656+
| | Tensor Types | Standard Types |
657+
|---|---|---|
658+
| **Types** | `TensorImage`, `TensorPointCloud`, `TensorDepthImage` | `Image`, `PointCloud`, `DepthImage` |
659+
| **IPC** | ~50ns (zero-copy Pod) | ~167ns (serde serialization) |
660+
| **Data location** | Shared memory pool | Inline in message |
661+
| **Best for** | High-throughput pipelines, ML inference | General use, rich API |
662+
| **API** | Shape-based accessors | Field-level pixel/point access |
663+
664+
---
665+
606666
## Detection Messages
607667

608668
Object detection results. All are POD types.
@@ -1037,6 +1097,8 @@ impl Node for ObstacleDetector {
10371097
## See Also
10381098

10391099
- **[POD Types](/concepts/core-concepts-podtopic)** — Zero-serialization for maximum performance
1100+
- **[Tensor Messages](/rust/api/tensor-messages)** — HorusTensor, Device, TensorDtype, and tensor domain types
1101+
- **[TensorPool API](/rust/api/tensor-pool)** — Tensor memory management and auto-managed pools
10401102
- **[Topic](/concepts/core-concepts-topic)** — The unified communication API
10411103
- **[Basic Examples](/rust/examples/basic-examples)** — Working examples with messages
10421104
- **[Architecture](/concepts/architecture)** — How messages fit into HORUS

content/docs/rust/api/tensor-messages.mdx

Lines changed: 163 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,54 +10,200 @@ Zero-copy tensor sharing between nodes for ML/AI workloads.
1010

1111
## HorusTensor
1212

13-
A ~200 byte descriptor pointing to data in shared memory:
13+
A 232-byte Pod descriptor pointing to data in shared memory:
1414

1515
```rust
16-
use horus::prelude::*; // Provides tensor::{HorusTensor, TensorDtype, TensorDevice};
16+
use horus::prelude::*; // Provides HorusTensor, TensorDtype, Device from horus_types
1717

1818
// Send tensor descriptor through Topic
1919
let topic: Topic<HorusTensor> = Topic::new("camera.frames")?;
2020

2121
if let Some(tensor) = topic.recv() {
2222
println!("Shape: {:?}", tensor.shape());
2323
println!("Dtype: {:?}", tensor.dtype);
24-
println!("Device: {}", tensor.device);
24+
println!("Device: {}", tensor.device());
2525
}
2626
```
2727

28+
All tensor types live in the `horus_types` crate — a leaf crate with zero HORUS dependencies. This is the single source of truth for `HorusTensor`, `TensorDtype`, and `Device`.
29+
2830
## TensorDtype
2931

3032
| Dtype | Size | Use Case |
3133
|-------|------|----------|
3234
| F32 | 4 | ML training/inference |
35+
| F64 | 8 | High-precision computation |
3336
| F16 | 2 | Memory-efficient inference |
3437
| BF16 | 2 | Training on modern GPUs |
3538
| U8 | 1 | Images |
39+
| U16 | 2 | Depth sensors (mm) |
40+
| U32 | 4 | Large indices |
41+
| U64 | 8 | Counters, timestamps |
3642
| I8 | 1 | Quantized inference |
43+
| I16 | 2 | Audio, sensor data |
44+
| I32 | 4 | General integer |
45+
| I64 | 8 | Large signed values |
46+
| Bool | 1 | Masks |
47+
48+
Helper methods:
49+
50+
```rust
51+
let dtype = TensorDtype::F32;
52+
assert_eq!(dtype.element_size(), 4);
53+
assert!(dtype.is_float());
54+
assert!(!dtype.is_signed_int());
55+
println!("{}", dtype); // "f32"
56+
57+
// DLPack interop
58+
let dl = dtype.to_dlpack();
59+
let back = TensorDtype::from_dlpack(dl.0, dl.1).unwrap();
60+
61+
// Parse from string
62+
let parsed = TensorDtype::parse("float32").unwrap();
63+
```
64+
65+
## Device
66+
67+
The `Device` struct replaces the old `TensorDevice` enum. It's a Pod-safe `repr(C)` struct supporting **unlimited GPU indices**:
68+
69+
```rust
70+
Device::cpu() // CPU / shared memory
71+
Device::cuda(0) // GPU 0
72+
Device::cuda(1) // GPU 1
73+
Device::cuda(7) // GPU 7 — no limit!
74+
75+
// Parse from string
76+
let dev = Device::parse("cuda:2").unwrap();
77+
let cpu = Device::parse("cpu").unwrap();
78+
79+
// Check device type
80+
assert!(Device::cpu().is_cpu());
81+
assert!(Device::cuda(0).is_cuda());
82+
println!("{}", Device::cuda(1)); // "cuda:1"
83+
```
84+
85+
## Auto-Managed Tensor Pools
86+
87+
`Topic<HorusTensor>` automatically manages a shared-memory `TensorPool` per topic. Users call `alloc_tensor()`, `send_handle()`, and `recv_handle()` instead of managing pools manually:
3788

38-
## TensorDevice
89+
```rust
90+
use horus::prelude::*;
91+
92+
let topic: Topic<HorusTensor> = Topic::new("camera/rgb")?;
93+
94+
// Allocate a 1080p RGB image from the topic's auto-managed pool
95+
let handle = topic.alloc_tensor(&[1080, 1920, 3], TensorDtype::U8, Device::cpu())?;
96+
97+
// Write pixel data
98+
let pixels = handle.data_slice_mut();
99+
// ... fill pixels ...
100+
101+
// Send — only the 232-byte descriptor flows through the ring buffer.
102+
// The actual tensor data stays in shared memory — true zero-copy.
103+
topic.send_handle(&handle);
104+
```
105+
106+
On the receiver side:
39107

40108
```rust
41-
TensorDevice::Cpu // Shared memory
42-
TensorDevice::Cuda0 // GPU 0
43-
TensorDevice::Cuda1 // GPU 1
109+
let topic: Topic<HorusTensor> = Topic::new("camera/rgb")?;
110+
111+
if let Some(recv_handle) = topic.recv_handle() {
112+
let data = recv_handle.data_slice(); // Zero-copy access to shared memory
113+
println!("Shape: {:?}", recv_handle.shape());
114+
println!("Dtype: {:?}", recv_handle.dtype());
115+
}
116+
// TensorHandle is RAII — refcount decremented automatically on drop
44117
```
45118

46-
## With TensorPool
119+
The pool is created lazily on first use and shared across all `Topic<HorusTensor>` instances with the same name — even across processes. Pool IDs are derived deterministically from the topic name.
120+
121+
## With Manual TensorPool
122+
123+
For advanced use cases, you can manage pools directly:
47124

48125
```rust
49-
use horus::prelude::*; // Provides {TensorPool, TensorPoolConfig, TensorDtype, TensorDevice}
126+
use horus::prelude::*;
50127

51128
let pool = TensorPool::new(1, TensorPoolConfig::default())?;
52-
let tensor = pool.alloc(&[1080, 1920, 3], TensorDtype::U8, TensorDevice::Cpu)?;
129+
let handle = TensorHandle::alloc(
130+
Arc::new(pool),
131+
&[1080, 1920, 3],
132+
TensorDtype::U8,
133+
Device::cpu(),
134+
)?;
53135

54136
// Write data
55-
pool.data_slice_mut(&tensor)[0] = 255;
137+
handle.data_slice_mut()[0] = 255;
56138

57139
// Share via Topic
58-
topic.send(tensor);
140+
topic.send(*handle.tensor());
59141
```
60142

143+
## Tensor Domain Types
144+
145+
For common robotics data, HORUS provides zero-overhead Pod wrappers around `HorusTensor` with domain-specific accessors. These use the same zero-copy shared memory path as `HorusTensor` — only the 232-byte descriptor is sent.
146+
147+
### TensorImage
148+
149+
Camera images with shape `[height, width, channels]`:
150+
151+
```rust
152+
use horus::prelude::*;
153+
154+
let topic: Topic<HorusTensor> = Topic::new("camera/rgb")?;
155+
156+
if let Some(handle) = topic.recv_handle() {
157+
let img = TensorImage::from_tensor(*handle.tensor());
158+
println!("{}x{}, {} channels", img.width(), img.height(), img.channels());
159+
println!("Encoding: {:?}", img.inferred_encoding()); // Rgb8, Mono8, etc.
160+
println!("Pixels: {}", img.pixel_count());
161+
}
162+
```
163+
164+
| Method | Description |
165+
|--------|-------------|
166+
| `height()` | Image height (shape dim 0) |
167+
| `width()` | Image width (shape dim 1) |
168+
| `channels()` | Channel count (shape dim 2, default 1) |
169+
| `dtype()` | Data type of pixel components |
170+
| `inferred_encoding()` | Infers ImageEncoding from dtype + channels |
171+
| `pixel_count()` | Total pixels (height * width) |
172+
| `nbytes()` | Total bytes of image data |
173+
| `is_cpu()` / `is_cuda()` | Device location |
174+
175+
### TensorPointCloud
176+
177+
Point clouds with shape `[N, K]` where K = fields per point:
178+
179+
```rust
180+
let cloud = TensorPointCloud::from_tensor(tensor);
181+
println!("{} points, {} fields", cloud.point_count(), cloud.fields_per_point());
182+
println!("XYZ: {}, Has color: {}", cloud.is_xyz(), cloud.has_color());
183+
```
184+
185+
| Fields per Point | Format |
186+
|-----------------|--------|
187+
| 3 | XYZ |
188+
| 4 | XYZI (XYZ + intensity) |
189+
| 6 | XYZRGB (XYZ + RGB) |
190+
191+
### TensorDepthImage
192+
193+
Depth images with shape `[height, width]`:
194+
195+
```rust
196+
let depth = TensorDepthImage::from_tensor(tensor);
197+
println!("{}x{}", depth.width(), depth.height());
198+
if depth.is_meters() { println!("F32, depth in meters"); }
199+
if depth.is_millimeters() { println!("U16, depth in millimeters"); }
200+
```
201+
202+
| Dtype | Unit |
203+
|-------|------|
204+
| F32 | Depth in meters (ML pipelines) |
205+
| U16 | Depth in millimeters (RealSense, etc.) |
206+
61207
## Python Interop
62208

63209
```python
@@ -66,3 +212,8 @@ tensor = topic.recv_tensor(node)
66212
arr = np.asarray(tensor) # Zero-copy via __array_interface__
67213
```
68214

215+
## See Also
216+
217+
- [TensorPool API](/rust/api/tensor-pool) — Pool management and configuration
218+
- [GPU Tensor Sharing](/advanced/gpu-tensor-sharing) — CUDA IPC guide
219+
- [Message Types](/concepts/message-types) — All HORUS message types

0 commit comments

Comments
 (0)