Skip to content
Change the repository type filter

All

    Repositories list

    • diffusers

      Public
      🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
      Python
      6.7k000Updated Jan 21, 2026Jan 21, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      4.1k000Updated Jan 20, 2026Jan 20, 2026
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      308000Updated Jan 20, 2026Jan 20, 2026
    • ffpa-attn

      Public
      🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
      Cuda
      1324610Updated Jan 20, 2026Jan 20, 2026
    • cache-dit

      Public
      A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.
      Python
      53400Updated Jan 19, 2026Jan 19, 2026
    • 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
      Python
      2550000Updated Jan 18, 2026Jan 18, 2026
    • 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
      Python
      3364.9k00Updated Jan 18, 2026Jan 18, 2026
    • 🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
      C++
      7704.4k10Updated Jan 18, 2026Jan 18, 2026
    • LeetCUDA

      Public
      📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
      Cuda
      9289.4k10Updated Jan 18, 2026Jan 18, 2026
    • flux-fast

      Public
      A forked version of flux-fast that makes flux-fast even faster with cache-dit.
      Python
      16400Updated Jan 5, 2026Jan 5, 2026
    • Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
      Python
      410100Updated Jan 1, 2026Jan 1, 2026
    • Z-Image

      Public
      Python
      565100Updated Dec 25, 2025Dec 25, 2025
    • Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
      Cuda
      320000Updated Dec 3, 2025Dec 3, 2025
    • .github

      Public
      0100Updated Nov 25, 2025Nov 25, 2025
    • [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
      Python
      83000Updated Oct 30, 2025Oct 30, 2025
    • 🔥LongCat-Video 1.7x🎉 speedup: cache acceleration and 4/8-bits weight only.
      Python
      0700Updated Oct 28, 2025Oct 28, 2025
    • Python
      270000Updated Oct 28, 2025Oct 28, 2025
    • ComfyUI

      Public
      The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
      Python
      11k000Updated Oct 27, 2025Oct 27, 2025
    • ⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
      Python
      01740Updated Oct 24, 2025Oct 24, 2025
    • Kandinsky 5.0: A family of diffusion models for Video & Image generation
      Python
      49000Updated Oct 22, 2025Oct 22, 2025
    • Wan2.1

      Public
      Wan: Open and Advanced Large-Scale Video Generative Models
      Python
      2.3k100Updated Oct 17, 2025Oct 17, 2025
    • Wan2.2

      Public
      Wan: Open and Advanced Large-Scale Video Generative Models
      Python
      1.6k000Updated Oct 17, 2025Oct 17, 2025
    • nunchaku

      Public
      [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
      Python
      216200Updated Oct 15, 2025Oct 15, 2025
    • Enjoy the magic of Diffusion models!
      Python
      1.1k000Updated Oct 13, 2025Oct 13, 2025
    • HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
      Python
      128100Updated Oct 4, 2025Oct 4, 2025
    • cache-dit for comfyui
      Python
      43130Updated Sep 27, 2025Sep 27, 2025
    • HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation​
      Python
      53100Updated Sep 10, 2025Sep 10, 2025
    • Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
      Python
      43000Updated Sep 9, 2025Sep 9, 2025
    • Model Compression Toolbox for Large Language Models and Diffusion Models
      Python
      86000Updated Aug 14, 2025Aug 14, 2025
    • SpargeAttention: A training-free sparse attention that can accelerate any model inference.
      Cuda
      82600Updated Aug 7, 2025Aug 7, 2025