OxTorch (v8.2 — "Iron Age: 2D Stride-Aware Tiling")

Run modern AI inference on hardware that PyTorch left behind.

OxTorch is a Rust tensor engine built for machines that are too slow, too old, or have too little RAM for mainstream frameworks. It streams model weights from SSD tile-by-tile (never loading the full model into RAM), pushes compute to whatever GPU the machine has via raw Vulkan, and falls back to hand-tuned SIMD for everything else.

No CUDA. Works on any Vulkan-capable GPU (AMD GCN+, Intel HD 500+, NVIDIA 900+).
No RAM limit. Weights stream from SSD via a hardware-tuned ring buffer (adaptive tiles) and io_uring.
No code changes. import oxtorch as torch — existing PyTorch inference scripts run unchanged.

Important

V8.2 "Iron Age" Status: The Vulkan backend is now 2D Stride-Aware. Numerical divergence in MatMul (parity drift ~244) has been resolved via native SPIR-V stride indexing. OxTorch now supports transposed and sliced tensors directly on GPU without CPU-side copies.

⚡ One-Import Drop-In

OxTorch ships a Python package called oxtorch that replaces PyTorch at the import level. Ops that OxTorch has implemented natively run in Rust (faster). Ops it hasn't implemented yet fall back silently to real PyTorch — you never hit a NotImplementedError.

import oxtorch as torch

# Everything below works exactly as before.
a = torch.randn(2048, 2048, dtype=torch.bfloat16)
b = torch.randn(2048, 2048, dtype=torch.bfloat16)
result = torch.matmul(a, b) # 400x faster than PT on non-AVX512 CPUs

🚀 Performance (v3.8.1-rc, Ivy Bridge i5-3450)

OxTorch specializes in Large-Vector SIMD and Asynchronous I/O.

Operation	Acceleration	Why?
MatMul F16/BF16	400x – 780x 🚀	Native F16C/SSE2 vs PyTorch scalar emulation (no AVX-512).
Linear BF16	26x 🚀	Optimized SIMD Core + Rayon parallelism.
GELU/ReLU	2x – 4x ✅	AVX1/NEON kernels + MSTS Tiling.
SSD Streaming	∞ 💎	Processes 100GB+ tensors on 8GB RAM via MSTS v2.

🛠️ Technical Overview: Iron Age (v8.2)

Version 8.2 introduces 2D Stride-Aware Tiling. The backend now handles memory layout metadata natively.

CrookScheduler: A triple-buffered ring of 8MB tiles.
Bitmask Barrier: A multi-stream handshake (A_ready | B_ready) that allows sources to load in parallel.
Global Capacitor: A massive RAM reservoir (50% RAM) that proactively prefetches SSD data via io_uring.
SIMD Auto-Dispatch: Runtime detection of AVX2, AVX1, SSE2, and NEON.

📚 Documentation Index

For full developer guides and architecture specs, see the Documentation Index.

Core Architecture: Decoding the Unified Pipeline.
Vulkan Internals: Ash, Tiling, and 2D Strides.
API Reference: Native and Python interface documentation.
Performance Guide: How we achieve 400x speedups on Ivy Bridge.

License

MIT License. Inspired by the MERA-400 — a Polish 16-bit minicomputer (1976).

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.agent		.agent
BitNet_repo		BitNet_repo
deep_research_prompts		deep_research_prompts
demos/splat_studio		demos/splat_studio
docs		docs
examples		examples
pytorch		pytorch
scripts		scripts
side_tools/gemini_to_markdown		side_tools/gemini_to_markdown
tests		tests
vulkannn_rusted		vulkannn_rusted
.env		.env
.gitignore		.gitignore
README.md		README.md
check_matmul.py		check_matmul.py
files_desc.md		files_desc.md
generate_prompts.py		generate_prompts.py
generate_tasks.py		generate_tasks.py
make_compact_task.py		make_compact_task.py
verify_msts.py		verify_msts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OxTorch (v8.2 — "Iron Age: 2D Stride-Aware Tiling")

⚡ One-Import Drop-In

🚀 Performance (v3.8.1-rc, Ivy Bridge i5-3450)

🛠️ Technical Overview: Iron Age (v8.2)

📚 Documentation Index

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OxTorch (v8.2 — "Iron Age: 2D Stride-Aware Tiling")

⚡ One-Import Drop-In

🚀 Performance (v3.8.1-rc, Ivy Bridge i5-3450)

🛠️ Technical Overview: Iron Age (v8.2)

📚 Documentation Index

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages