Skip to content

LuxTTS#37

Closed
joshuasundance-swca wants to merge 9 commits into
FranckyB:devfrom
joshuasundance-swca:luxtts
Closed

LuxTTS#37
joshuasundance-swca wants to merge 9 commits into
FranckyB:devfrom
joshuasundance-swca:luxtts

Conversation

@joshuasundance-swca

Copy link
Copy Markdown

LuxTTS

A different take on LuxTTS implementation, for consideration and reference

Summary

  • Adds LuxTTS as a first-class engine alongside Qwen3 and VibeVoice, including prompt caching, 48 kHz output, and advanced tuning controls.
  • Extends tool UI, settings, and shared state to support engine toggles, LuxTTS defaults, and audio-only prompt encoding for LuxTTS.
  • Updates Docker/runtime dependencies and docs to support LuxTTS and caching behavior.

Key Changes

Core LuxTTS Integration

Tool UI And Workflow Updates

Docs And Help Text

Docker/Runtime And Dependencies

Notable Behavior Details

  • LuxTTS prompt cache is stored as <sample>_luxtts.pt with parameters embedded in the cache metadata; cache validity is checked against audio hash, rms, and ref_duration.
  • LuxTTS prompt encoding uses audio-only encode_prompt (no transcript required), while Qwen still requires transcripts for prompt caching.
  • LuxTTS audio returns at 48 kHz, while other engines return 24 kHz.

Behavior Change Note (Upstream Alignment)

  • LuxTTS no longer requires transcripts for prompt encoding (audio-only), which differs from prior transcript-enforced behavior and aligns with upstream LuxTTS usage.

Issues/Concerns To Evaluate

  • Mixed sample rates (LuxTTS 48 kHz vs 24 kHz elsewhere) could affect downstream workflows that assume SAMPLE_RATE consistency.
  • LuxTTS install path differs between local and Docker paths; if LuxTTS is not installed locally, generation fails at runtime.

@FranckyB

FranckyB commented Feb 8, 2026

Copy link
Copy Markdown
Owner

I'm perhaps missing it, but does this add anything?
It seems to be mostly my code, with formatting tweaks?

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds LuxTTS as an additional TTS/voice-clone engine integrated into the existing modular Gradio tool architecture (alongside Qwen3 and VibeVoice), including prompt caching, new UI controls, settings toggles, and container/runtime dependency updates.

Changes:

  • Integrates LuxTTS into the TTS manager with audio-only prompt encoding and on-disk/in-memory prompt caching.
  • Updates Voice Clone / Conversation / Prep Audio / Settings tooling to expose LuxTTS engine options, parameters, cache visibility, and persisted preferences.
  • Updates docs and Docker/runtime configuration to support LuxTTS dependencies and cache directories.

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 29 comments.

Show a summary per file
File Description
voice_clone_studio.py Passes LuxTTS defaults into shared state and minor formatting/cleanup.
requirements.txt Adds LuxTTS-related Python dependencies (plus some commented guidance).
modules/core_components/ui_components/init.py Adds LuxTTS advanced-parameter UI components (device/threads, etc.).
modules/core_components/tools/voice_clone.py Adds LuxTTS engine selection, LuxTTS advanced controls, and transcript requirement only for Qwen.
modules/core_components/tools/conversation.py Adds LuxTTS as a conversation engine path with LuxTTS param UI and sequential generation.
modules/core_components/tools/prep_audio.py Extends cache cleanup logic to include LuxTTS cache files and in-memory cache clearing.
modules/core_components/tools/settings.py Adds LuxTTS runtime settings and engine toggles; updates model download options.
modules/core_components/tools/init.py Extends shared-state/default config, adds LuxTTS cache helpers, and engine/constants wiring.
modules/core_components/help_page.py Updates help text to mention LuxTTS and caching behavior.
modules/core_components/constants.py Adds LuxTTS engine metadata and defaults/constants.
modules/core_components/ai_models/tts_manager.py Implements LuxTTS loading, prompt caching, and LuxTTS voice-clone generation method.
modules/core_components/init.py Re-exports LuxTTS defaults.
docs/updates.md Notes LuxTTS addition in version history.
docker-compose.yaml Adds .env support, TORCH_HOME, cache mounts, and changes build target to app.
README.md Updates branding/feature list to include LuxTTS.
Dockerfile Adds LuxTTS dependencies and introduces app stage used by compose.
.gitignore Ignores .cache/ and .env.
.env-example Adds example env vars for HF/Torch cache paths.
Comments suppressed due to low confidence (1)

modules/core_components/constants.py:258

  • LUXTTS_GENERATION_DEFAULTS duplicates LUXTTS_DEFAULTS (defined just above) and also uses different key names (e.g., cpu_threads vs threads). Consolidate to a single LuxTTS defaults dict to avoid configuration drift and confusion over which one is authoritative.
# LuxTTS Generation Defaults
LUXTTS_GENERATION_DEFAULTS = {
    "num_steps": 4,
    "t_shift": 0.9,
    "speed": 1.0,

lux_keys = [
k
for k in self._luxtts_prompt_cache.keys()
if k.startswith(f"{sample_name}_")

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear_prompt_cache_for_sample() won’t clear LuxTTS entries because _luxtts_prompt_cache is keyed by sample_name, but the code filters keys with startswith(f"{sample_name}_"). Update the predicate to match the actual key format (e.g., exact match) so LuxTTS in-memory prompts are cleared when samples are edited/deleted.

Suggested change
if k.startswith(f"{sample_name}_")
if k == sample_name

Copilot uses AI. Check for mistakes.
Comment on lines +136 to 140
"--- LuxTTS ---",
"LuxTTS",
"--- VibeVoice ASR ---",
"VibeVoice-ASR",
"--- LuxTTS ---",

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model download dropdown includes LuxTTS twice (a LuxTTS section at lines 136–137 and then another LuxTTS section at line 140+). This creates duplicate UI entries; keep a single LuxTTS header + option.

Copilot uses AI. Check for mistakes.
Comment on lines 158 to 162
"VibeVoice-Large": "FranckyB/VibeVoice-Large",
"LuxTTS": "YatharthS/LuxTTS",
"VibeVoice-ASR": "microsoft/VibeVoice-ASR",
"LuxTTS": "YatharthS/LuxTTS",
}

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MODEL_ID_MAP repeats the "LuxTTS" key. In Python dicts the latter entry wins, so the earlier one is redundant and makes the mapping harder to maintain. Remove the duplicate key entry.

Copilot uses AI. Check for mistakes.
Comment on lines 103 to +110
tool_class = None
for attr_name in dir(tool_module):
attr = getattr(tool_module, attr_name)
if isinstance(attr, type) and issubclass(attr, Tool) and attr is not Tool:
if (
isinstance(attr, type)
and issubclass(attr, Tool) # F821 Undefined name `Tool`
and attr is not Tool # F821 Undefined name `Tool`
):

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback branch references Tool (and even includes an inline "F821 Undefined name" note), but Tool isn’t imported. Even if this path is currently unused, it’s dead/unsafe code; either import Tool or delete the fallback implementation.

Copilot uses AI. Check for mistakes.
Comment thread .env-example
Comment on lines +1 to +6
HF_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\huggingface
HF_CACHE_CONTAINER_PATH=/home/user/app/.cache/huggingface
HF_TOKEN=hf_...

TORCH_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\torch
TORCH_CACHE_CONTAINER_PATH=/home/user/app/.cache/torch

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.env-example hardcodes a specific Windows username/path ("C:\users\joshua.bailey\...") which is not portable. Use a generic placeholder path (e.g., "C:\Users\\.cache\huggingface") so the example applies to all users.

Suggested change
HF_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\huggingface
HF_CACHE_CONTAINER_PATH=/home/user/app/.cache/huggingface
HF_TOKEN=hf_...
TORCH_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\torch
TORCH_CACHE_CONTAINER_PATH=/home/user/app/.cache/torch
HF_CACHE_HOST_PATH=C:\Users\<your-user>\.cache\huggingface
HF_CACHE_CONTAINER_PATH=/home/<your-user>/app/.cache/huggingface
HF_TOKEN=hf_...
TORCH_CACHE_HOST_PATH=C:\Users\<your-user>\.cache\torch
TORCH_CACHE_CONTAINER_PATH=/home/<your-user>/app/.cache/torch

Copilot uses AI. Check for mistakes.
refresh_samples = shared_state["refresh_samples"]
confirm_trigger = shared_state["confirm_trigger"]
input_trigger = shared_state["input_trigger"]

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable input_trigger is not used.

Suggested change
components["confirm_trigger"] = confirm_trigger
components["input_trigger"] = input_trigger

Copilot uses AI. Check for mistakes.
get_sample_choices = shared_state["get_sample_choices"]
get_available_samples = shared_state["get_available_samples"]
load_sample_details = shared_state["load_sample_details"]
get_prompt_cache_path = shared_state["get_prompt_cache_path"]

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable get_prompt_cache_path is not used.

Suggested change
get_prompt_cache_path = shared_state["get_prompt_cache_path"]

Copilot uses AI. Check for mistakes.
load_sample_details = shared_state["load_sample_details"]
get_prompt_cache_path = shared_state["get_prompt_cache_path"]
get_or_create_voice_prompt = shared_state["get_or_create_voice_prompt"]
refresh_samples = shared_state["refresh_samples"]

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable refresh_samples is not used.

Suggested change
refresh_samples = shared_state["refresh_samples"]

Copilot uses AI. Check for mistakes.
"get_configured_dir",
"load_config",
"save_config",
"save_preference",

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'save_preference' is exported by all but is not defined.

Copilot uses AI. Check for mistakes.
processed_lines.append(line)
else:
# Add default [1]: label
processed_lines.append(f"[1]: {line}")
return '\n'.join(processed_lines)
return "\n".join(processed_lines)

def extract_style_instructions(text):
"""Extract style instructions from parentheses."""
import re

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module re is redundant, as it was previously imported on line 20.

Copilot uses AI. Check for mistakes.
@joshuasundance-swca

Copy link
Copy Markdown
Author

I'm perhaps missing it, but does this add anything?
It seems to be mostly my code, with formatting tweaks?

Sorry about the formatting changes, I think vscode was running black on modified files but I didn't do it on purpose.

The work in this PR is an agent trying to reconcile my original luxtts branch with your dev branch, and later, doing merge conflict resolution because you had implemented luxtts while the agent was working 😅

I was trying to catch my diverged luxtts branch back up to dev.

Since I wasn't sure these changes would still be useful, this [draft] pr was more of a reference. I think it adds [and hopefully improves] a few things but it looks like copilot caught some weird merge artifacts. I tested the changes only in docker, but it worked well.

@FranckyB

FranckyB commented Feb 8, 2026

Copy link
Copy Markdown
Owner

I'll close this PR for now.
If there are things we should address we should limit it to that.
This changes too many things for no reason

@FranckyB FranckyB closed this Feb 8, 2026
@joshuasundance-swca

Copy link
Copy Markdown
Author

I'll close this PR for now.
If there are things we should address we should limit it to that.
This changes too many things for no reason

Agreed. 😀 Sorry for the confusion; the intent was more to share the code, not so much to merge it at this time. I will extract specific differences and bring them to your attention as appropriate.

@joshuasundance-swca

Copy link
Copy Markdown
Author

Here is the short version of what is different about this PR versus the LuxTTS already in upstream-dev.

Upstream-dev already ships LuxTTS (manager + tools + 48 kHz + transcript-based prompt caching). This PR does not re-add that. The actual deltas are:

So: the PR is mainly about LuxTTS prompt handling (audio-only), runtime controls, cache safety, and container readiness.


The response above was written by the same agent that did the merge conflict resolution. Putting it here just for consideration and reference, but again, this draft PR was not meant to be merged as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants