Skip to content
Change the repository type filter

All

    Repositories list

    • JAX-Toolbox

      Public
      JAX-Toolbox
      Python
      663648047Updated Dec 5, 2025Dec 5, 2025
    • C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
      C++
      30686640491Updated Dec 5, 2025Dec 5, 2025
    • NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated computing environments
      Go
      24964010Updated Dec 5, 2025Dec 5, 2025
    • Ongoing research training transformer models at scale
      Python
      3.3k14k328241Updated Dec 5, 2025Dec 5, 2025
    • gpu-operator

      Public
      NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
      Go
      4202.4k9568Updated Dec 5, 2025Dec 5, 2025
    • cccl

      Public
      CUDA Core Compute Libraries
      C++
      2962.1k1.1k198Updated Dec 5, 2025Dec 5, 2025
    • tilus

      Public
      Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
      Python
      1040881Updated Dec 5, 2025Dec 5, 2025
    • NeMo-Agent-Toolkit

      Public
      The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
      Python
      4401.6k5326Updated Dec 5, 2025Dec 5, 2025
    • Build and run containers leveraging NVIDIA GPUs
      Go
      4443.9k11722Updated Dec 5, 2025Dec 5, 2025
    • NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.
      Python
      37239114Updated Dec 5, 2025Dec 5, 2025
    • TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
      Python
      1.9k12k611449Updated Dec 5, 2025Dec 5, 2025
    • A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
      Python
      2051.6k6838Updated Dec 5, 2025Dec 5, 2025
    • spark-rapids-benchmarks

      Public
      Spark RAPIDS Benchmarks – benchmark sets and utilities for the RAPIDS Accelerator for Apache Spark
      Python
      3543304Updated Dec 5, 2025Dec 5, 2025
    • CUDA Python: Performance meets Productivity
      Python
      2263.1k19518Updated Dec 5, 2025Dec 5, 2025
    • skyhook

      Public
      A Kubernetes Operator to manage Node OS customizations.
      Go
      33301Updated Dec 5, 2025Dec 5, 2025
    • proxyfs

      Public
      Go
      24671413Updated Dec 5, 2025Dec 5, 2025
    • nv-ingest

      Public
      NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
      Python
      2772.8k10137Updated Dec 5, 2025Dec 5, 2025
    • spark-rapids

      Public
      Spark RAPIDS plugin - accelerate Apache Spark with GPUs
      Scala
      2649501.8k31Updated Dec 5, 2025Dec 5, 2025
    • Fuser

      Public
      A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
      C++
      69363208212Updated Dec 5, 2025Dec 5, 2025
    • spark-rapids-jni

      Public
      RAPIDS Accelerator JNI For Apache Spark
      Cuda
      7452826Updated Dec 5, 2025Dec 5, 2025
    • A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
      Python
      5703k263100Updated Dec 5, 2025Dec 5, 2025
    • cudaqx

      Public
      Accelerated libraries for quantum-classical computing built on CUDA-Q.
      C++
      37702513Updated Dec 5, 2025Dec 5, 2025
    • Community examples utilizing NVIDIA NeMo Agent Toolkit.
      Python
      51001Updated Dec 5, 2025Dec 5, 2025
    • Kubernetes Device Plugin to help cold plug vfio/iommufd GPUs in Kata VMs for Confidential Containers
      Go
      1109Updated Dec 5, 2025Dec 5, 2025
    • aistore

      Public
      AIStore: scalable storage for AI applications
      Go
      2271.7k00Updated Dec 5, 2025Dec 5, 2025
    • OSMO

      Public
      The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML
      Python
      15257Updated Dec 5, 2025Dec 5, 2025
    • BioNeMo Framework: For building and adapting AI models in drug discovery at scale
      Jupyter Notebook
      10459660103Updated Dec 5, 2025Dec 5, 2025
    • TorchFort

      Public
      An Online Deep Learning Interface for HPC programs on NVIDIA GPUs
      C++
      2817611Updated Dec 5, 2025Dec 5, 2025
    • numbast

      Public
      Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
      Python
      16532811Updated Dec 5, 2025Dec 5, 2025
    • cuopt

      Public
      GPU accelerated decision optimization
      Cuda
      975937325Updated Dec 5, 2025Dec 5, 2025