Releases: oobabooga/text-generation-webui
v3.23
Changes
- Improve the style of tables and horizontal separators in chat messages
Bug fixes
- Fix loading models which have their eos token disabled (#7363). Thanks, @jin-eld.
- Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/55abc393552f3f2097f168cb6db4dc495a514d56
- Update bitsandbytes to 0.49
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.22
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.21
Changes
- Reduce the size of all Linux/macOS portable builds by excluding llama.cpp symlinks (dereferenced due to Python whl limitations) and recreating them on first launch.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/5c8a717128cc98aa9e5b1c44652f5cf458fd426e
- Update ExLlamaV3 to 0.0.18
- Update safetensors to 0.7
- Update triton-windows to 3.5.1.post22
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.20
Image generation support!
Changes
- Image generation support: Generate images with
diffusersmodels like Z-Image-Turbo in a new "Image AI" tab. Features include:- 4bit/8bit quantization
torch.compilesupport- LLM-generated prompt variations
- PNG metadata for generation settings
- Gallery for past generations
- Progress bar
- OpenAI-compatible API endpoint for image generation
For a step-by-step tutorial, consult: Image Generation Tutorial
- Pass
bos_tokenandeos_tokento jinja2 templates, making it possible to use the template forSeed-OSS-36B-Instructand other models - Use
flash_attention_2by default for Transformers models
Bug fixes
- Fix API requests always returning the same
createdtime
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/0a540f9abd98915edb99fed47d80078ed8d2f343
- Update ExLlamaV3 to 0.0.17
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.19
Qwen3-Next llama.cpp support!
Changes
- Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316). Thanks, @GodEmperor785.
- This significantly improves prompt processing speeds for MoE models in both full-GPU and GPU+CPU configurations.
Bug fixes
- fix(deps): upgrade coqui-tts to >=0.27.0 for transformers 4.55 compatibility (#7329). Thanks, @aidevtime.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/ff55414c42522adbeaa1bd9c52c0e9db16942484, adding Qwen3-Next support
- Update ExLlamaV3 to 0.0.16
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.18
Changes
- Add
--cpu-moeflag for llama.cpp to move MoE model experts to CPU, reducing VRAM usage. - Add ROCm portable builds for AMD GPUs on Linux. This was made possible by PR oobabooga/llama-cpp-binaries#7 by @ShortTimeNoSee. Thanks, @ShortTimeNoSee.
- Remove deprecated macOS 13 wheels (no longer supported by GitHub Actions).
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/10e9780154365b191fb43ca4830659ef12def80f
- Update ExLlamaV3 to 0.0.15
- Update peft to 0.18.*
- Update triton-windows to 3.5.1.post21
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.17
Changes
- Add
weights_only=Truetotorch.loadin Training_PRO for better security.
Bug fixes
- Pin huggingface-hub to 0.36.0 to fix manual venv installs.
- fix: Rename 'evaluation_strategy' to 'eval_strategy' in training. Thanks, @inyourface34456.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/230d1169e5bfe04a013b2e20f4662ee56c2454b0 (adds Qwen3-VL support)
- Update exllamav3 to 0.0.12
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64. - Intel CPU: Use
macos-x86_64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.16
Changes
Bug fixes
- Fixed python requirements for apple devices with macos tahoe (#7273). Thanks, @drieschel.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/d0660f237a5c31771a3d6d1030ebe3e0c409ba92 (adds Ling-mini-2.0, Ring-mini-2.0 support)
- Update exllamav3 to 0.0.11
- Update triton-windows to 3.5.0.post21
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4for newer GPUs orcuda11.7for older GPUs and systems with older drivers. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64. - Intel CPU: Use
macos-x86_64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.15
Changes
- log error when llama-server request exceeds context size (#7263). Thanks, @mamei16.
- Make --trust-remote-code immutable from the UI/API for better security.
Bug fixes
- Fix metadata leaking into branched chats.
- Fix "continue" missing an initial space in chat-instruct/chat modes.
- Fix resuming incomplete downloads after HF moved to Xet.
- Revert exllamav3_hf changes in v3.14 that made it output gibberish.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/f9fb33f2630b4b4ba9081ce9c0c921f8cd8ba4eb.
- Update exllamav3 0.0.10.
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4for newer GPUs orcuda11.7for older GPUs and systems with older drivers. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64. - Intel CPU: Use
macos-x86_64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
v3.14
Changes
- Better handle multi-GPU setups when using Transformers with bitsandbytes (
load-in-8bitandload-in-4bit). - Implement the
/v1/internal/logitsendpoint for theexllamav3andexllamav3_hfloaders. - Make profile picture uploading safer.
- Add
flato the requirements for Exllamav3 to supportqwen3-nextmodels.
Bug fixes
- Fix an issue with loading certain chat histories in Instruct mode. Thanks, @Remowylliams.
- Fix portable builds for macOS x86 missing llama.cpp binaries (#7238). Thanks, @IonoclastBrigham.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/d00cbea63c671cd85a57adaa50abf60b3b87d86f.
- Update transformers to 4.57.
- Update exllamav3 0.0.7.
- Update bitsandbytes to 0.48.
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4for newer GPUs orcuda11.7for older GPUs and systems with older drivers. - AMD/Intel GPU: Use
vulkanbuilds. - CPU only: Use
cpubuilds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64. - Intel CPU: Use
macos-x86_64.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.