Skip to content

Releases: oobabooga/text-generation-webui

v3.23

08 Jan 20:54
910456b

Choose a tag to compare

Changes

  • Improve the style of tables and horizontal separators in chat messages

Bug fixes

  • Fix loading models which have their eos token disabled (#7363). Thanks, @jin-eld.
  • Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.22

20 Dec 05:19
a0b5599

Choose a tag to compare

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.21

15 Dec 01:59
34804f9

Choose a tag to compare

Changes

  • Reduce the size of all Linux/macOS portable builds by excluding llama.cpp symlinks (dereferenced due to Python whl limitations) and recreating them on first launch.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.20

07 Dec 20:58
652d13c

Choose a tag to compare

Image generation support!

523303375-5108de50-658b-4e93-b2ae-4656d076bc9d

Changes

  • Image generation support: Generate images with diffusers models like Z-Image-Turbo in a new "Image AI" tab. Features include:
    • 4bit/8bit quantization
    • torch.compile support
    • LLM-generated prompt variations
    • PNG metadata for generation settings
    • Gallery for past generations
    • Progress bar
    • OpenAI-compatible API endpoint for image generation

For a step-by-step tutorial, consult: Image Generation Tutorial

  • Pass bos_token and eos_token to jinja2 templates, making it possible to use the template for Seed-OSS-36B-Instruct and other models
  • Use flash_attention_2 by default for Transformers models

Bug fixes

  • Fix API requests always returning the same created time

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.19

29 Nov 02:00
bd9f2de

Choose a tag to compare

Qwen3-Next llama.cpp support!

Changes

  • Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316). Thanks, @GodEmperor785.
    • This significantly improves prompt processing speeds for MoE models in both full-GPU and GPU+CPU configurations.

Bug fixes

  • fix(deps): upgrade coqui-tts to >=0.27.0 for transformers 4.55 compatibility (#7329). Thanks, @aidevtime.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.18

19 Nov 14:04
1afe082

Choose a tag to compare

Changes

  • Add --cpu-moe flag for llama.cpp to move MoE model experts to CPU, reducing VRAM usage.
  • Add ROCm portable builds for AMD GPUs on Linux. This was made possible by PR oobabooga/llama-cpp-binaries#7 by @ShortTimeNoSee. Thanks, @ShortTimeNoSee.
  • Remove deprecated macOS 13 wheels (no longer supported by GitHub Actions).

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.17

06 Nov 03:39
9ad9afa

Choose a tag to compare

Changes

  • Add weights_only=True to torch.load in Training_PRO for better security.

Bug fixes

  • Pin huggingface-hub to 0.36.0 to fix manual venv installs.
  • fix: Rename 'evaluation_strategy' to 'eval_strategy' in training. Thanks, @inyourface34456.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.16

23 Oct 15:50
fc67e5e

Choose a tag to compare

Changes

  • Make it possible to run a portable Web UI build via a symlink (#7277). Thanks, @reksar.

Bug fixes

  • Fixed python requirements for apple devices with macos tahoe (#7273). Thanks, @drieschel.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.15

15 Oct 20:15
7711305

Choose a tag to compare

Changes

  • log error when llama-server request exceeds context size (#7263). Thanks, @mamei16.
  • Make --trust-remote-code immutable from the UI/API for better security.

Bug fixes

  • Fix metadata leaking into branched chats.
  • Fix "continue" missing an initial space in chat-instruct/chat modes.
  • Fix resuming incomplete downloads after HF moved to Xet.
  • Revert exllamav3_hf changes in v3.14 that made it output gibberish.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.14

10 Oct 13:47
7833650

Choose a tag to compare

Changes

  • Better handle multi-GPU setups when using Transformers with bitsandbytes (load-in-8bit and load-in-4bit).
  • Implement the /v1/internal/logits endpoint for the exllamav3 and exllamav3_hf loaders.
  • Make profile picture uploading safer.
  • Add fla to the requirements for Exllamav3 to support qwen3-next models.

Bug fixes

  • Fix an issue with loading certain chat histories in Instruct mode. Thanks, @Remowylliams.
  • Fix portable builds for macOS x86 missing llama.cpp binaries (#7238). Thanks, @IonoclastBrigham.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.