08 Jan 20:54

oobabooga

910456b

v3.23 Latest

Latest

Changes

Improve the style of tables and horizontal separators in chat messages

Bug fixes

Fix loading models which have their eos token disabled (#7363). Thanks, @jin-eld.
Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/55abc393552f3f2097f168cb6db4dc495a514d56
Update bitsandbytes to 0.49

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

jin-eld

Assets 10

20 Dec 05:19

oobabooga

v3.22

a0b5599

v3.22

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 10

15 Dec 01:59

oobabooga

v3.21

34804f9

v3.21

Changes

Reduce the size of all Linux/macOS portable builds by excluding llama.cpp symlinks (dereferenced due to Python whl limitations) and recreating them on first launch.

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/5c8a717128cc98aa9e5b1c44652f5cf458fd426e
Update ExLlamaV3 to 0.0.18
Update safetensors to 0.7
Update triton-windows to 3.5.1.post22

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 10

07 Dec 20:58

oobabooga

v3.20

652d13c

v3.20

Image generation support!

523303375-5108de50-658b-4e93-b2ae-4656d076bc9d

Changes

Image generation support: Generate images with diffusers models like Z-Image-Turbo in a new "Image AI" tab. Features include:
- 4bit/8bit quantization
- torch.compile support
- LLM-generated prompt variations
- PNG metadata for generation settings
- Gallery for past generations
- Progress bar
- OpenAI-compatible API endpoint for image generation

For a step-by-step tutorial, consult: Image Generation Tutorial

Pass bos_token and eos_token to jinja2 templates, making it possible to use the template for Seed-OSS-36B-Instruct and other models
Use flash_attention_2 by default for Transformers models

Bug fixes

Fix API requests always returning the same created time

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/0a540f9abd98915edb99fed47d80078ed8d2f343
Update ExLlamaV3 to 0.0.17

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 11

29 Nov 02:00

oobabooga

v3.19

bd9f2de

v3.19

Qwen3-Next llama.cpp support!

Changes

Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316). Thanks, @GodEmperor785.
- This significantly improves prompt processing speeds for MoE models in both full-GPU and GPU+CPU configurations.

Bug fixes

fix(deps): upgrade coqui-tts to >=0.27.0 for transformers 4.55 compatibility (#7329). Thanks, @aidevtime.

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/ff55414c42522adbeaa1bd9c52c0e9db16942484, adding Qwen3-Next support
Update ExLlamaV3 to 0.0.16

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

GodEmperor785 and aidevtime

Assets 11

19 Nov 14:04

oobabooga

v3.18

1afe082

v3.18

Changes

Add --cpu-moe flag for llama.cpp to move MoE model experts to CPU, reducing VRAM usage.
Add ROCm portable builds for AMD GPUs on Linux. This was made possible by PR oobabooga/llama-cpp-binaries#7 by @ShortTimeNoSee. Thanks, @ShortTimeNoSee.
Remove deprecated macOS 13 wheels (no longer supported by GitHub Actions).

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/10e9780154365b191fb43ca4830659ef12def80f
Update ExLlamaV3 to 0.0.15
Update peft to 0.18.*
Update triton-windows to 3.5.1.post21

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

ShortTimeNoSee

Assets 10

06 Nov 03:39

oobabooga

v3.17

9ad9afa

v3.17

Changes

Add weights_only=True to torch.load in Training_PRO for better security.

Bug fixes

Pin huggingface-hub to 0.36.0 to fix manual venv installs.
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training. Thanks, @inyourface34456.

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/230d1169e5bfe04a013b2e20f4662ee56c2454b0 (adds Qwen3-VL support)
Update exllamav3 to 0.0.12

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

inyourface34456

Assets 10

23 Oct 15:50

oobabooga

v3.16

fc67e5e

v3.16

Changes

Make it possible to run a portable Web UI build via a symlink (#7277). Thanks, @reksar.

Bug fixes

Fixed python requirements for apple devices with macos tahoe (#7273). Thanks, @drieschel.

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/d0660f237a5c31771a3d6d1030ebe3e0c409ba92 (adds Ling-mini-2.0, Ring-mini-2.0 support)
Update exllamav3 to 0.0.11
Update triton-windows to 3.5.0.post21

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

drieschel and reksar

Assets 12

15 Oct 20:15

oobabooga

v3.15

7711305

v3.15

Changes

log error when llama-server request exceeds context size (#7263). Thanks, @mamei16.
Make --trust-remote-code immutable from the UI/API for better security.

Bug fixes

Fix metadata leaking into branched chats.
Fix "continue" missing an initial space in chat-instruct/chat modes.
Fix resuming incomplete downloads after HF moved to Xet.
Revert exllamav3_hf changes in v3.14 that made it output gibberish.

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/f9fb33f2630b4b4ba9081ce9c0c921f8cd8ba4eb.
Update exllamav3 0.0.10.

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

mamei16

Assets 12

10 Oct 13:47

oobabooga

v3.14

7833650

v3.14

Changes

Better handle multi-GPU setups when using Transformers with bitsandbytes (load-in-8bit and load-in-4bit).
Implement the /v1/internal/logits endpoint for the exllamav3 and exllamav3_hf loaders.
Make profile picture uploading safer.
Add fla to the requirements for Exllamav3 to support qwen3-next models.

Bug fixes

Fix an issue with loading certain chat histories in Instruct mode. Thanks, @Remowylliams.
Fix portable builds for macOS x86 missing llama.cpp binaries (#7238). Thanks, @IonoclastBrigham.

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/d00cbea63c671cd85a57adaa50abf60b3b87d86f.
Update transformers to 4.57.
Update exllamav3 0.0.7.
Update bitsandbytes to 0.48.

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

Remowylliams and IonoclastBrigham

Assets 12

Releases: oobabooga/text-generation-webui

v3.23

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.22

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.21

Changes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.20

Image generation support!

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.19

Qwen3-Next llama.cpp support!

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.18

Changes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.17

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.16

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.15

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.14

Changes

Bug fixes