NeLux is a high-performance Python library for video processing, leveraging the power of FFmpeg with hardware acceleration (NVDEC/NVENC). It delivers some of the fastest decode times globally, enabling efficient video decoding directly into ML-ready PyTorch tensors.
Originall created by Trentonom0r3
pip install neluxfrom nelux import VideoReader
# Open video with hardware acceleration
reader = VideoReader("input.mp4", decode_accelerator="nvdec")
# Read frames - automatically BCHW format!
for frame in reader:
print(frame.shape) # [1, 3, 1080, 1920] - BCHW
print(frame.dtype) # torch.float16 for 8-bit videos
# Ready for ML inference immediately
output = model(frame)from nelux import VideoReader
vr = VideoReader("video.mp4")
# Get specific frames
batch = vr.get_batch([0, 10, 20]) # [3, 3, H, W]
batch = vr.get_batch(range(0, 100, 10)) # [10, 3, H, W]
# Pythonic slice notation
batch = vr[0:100:10] # [10, 3, H, W]
single = vr[42] # Single frame
# Negative indexing
batch = vr[[-3, -2, -1]] # Last 3 frames
# Properties
print(len(vr)) # Total frame count
print(vr.shape) # (frames, 3, H, W)from nelux import VideoReader
import torch
reader = VideoReader("input.mp4")
with reader.create_encoder("output.mp4") as enc:
# Re-encode video frames
for frame in reader:
enc.encode_frame(frame)
# Encode audio if present
if reader.has_audio:
pcm = reader.audio.tensor().to(torch.int16)
enc.encode_audio_frame(pcm)
print("Done!")- Hardware Acceleration: NVDEC (decode) and NVENC (encode) support
- ML-Ready Output: BCHW format with automatic dtype selection
- FP16 for 8-bit videos (optimal for ML)
- FP32 for 10/12/16-bit videos (higher precision)
- Zero-Copy: Direct GPU tensor output, no CPU round-trip
- Batch Decoding: Efficient multi-frame decoding with smart optimization
- Audio Support: Extract and encode audio streams
- Fused Operations: Color conversion + format change + normalization in single CUDA kernel
- Smart Seeking: Minimizes seeks in batch operations (only seeks on backward jumps or large gaps)
- Deduplication: Duplicate frame requests decoded once and shared
- Asynchronous Decode: Non-blocking GPU operations with event-based synchronization
| Feature | Support |
|---|---|
| Video Codecs | H.264, H.265/HEVC, VP9, AV1 (with NVDEC) |
| Pixel Formats | NV12, P010, P016, YUV444 (8/10/12/16-bit) |
| Audio | AAC, MP3, FLAC, PCM (extraction & encoding) |
| Containers | MP4, MKV, AVI, MOV, WebM |
VideoReader(
file_path: str,
num_threads: int = 4,
force_8bit: bool = False,
decode_accelerator: str = "cpu", # "cpu" or "nvdec"
cuda_device_index: int = 0
)Properties:
shape: Tuple of(frames, 3, height, width)frame_count: Total number of framesfps: Frame rateduration: Video duration in secondshas_audio: Whether video has audio stream
Methods:
get_batch(indices): Decode multiple frames efficientlyget_batch_range(start, end, step): Decode frame rangecreate_encoder(output_path): Create video encoder__getitem__(index): Frame access viareader[42]orreader[0:100:10]
- Full Usage Guide - Complete API reference
- Changelog - Version history
- Benchmarks - Performance comparisons
- Python: 3.8+
- PyTorch: 2.0+ (with CUDA support for GPU acceleration)
- CUDA: 11.8+ (for NVDEC/NVENC)
- OS: Windows 10/11, Linux (Ubuntu 20.04+)
git clone https://github.com/NevermindNilas/NeLux.git
cd NeLux
# Install dependencies
pip install -r requirements.txt
# Build (requires CMake, CUDA toolkit, FFmpeg)
python setup.py build_ext --inplaceSee BUILD.md for detailed build instructions.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.