LuxTTS#37
Conversation
|
I'm perhaps missing it, but does this add anything? |
There was a problem hiding this comment.
Pull request overview
Adds LuxTTS as an additional TTS/voice-clone engine integrated into the existing modular Gradio tool architecture (alongside Qwen3 and VibeVoice), including prompt caching, new UI controls, settings toggles, and container/runtime dependency updates.
Changes:
- Integrates LuxTTS into the TTS manager with audio-only prompt encoding and on-disk/in-memory prompt caching.
- Updates Voice Clone / Conversation / Prep Audio / Settings tooling to expose LuxTTS engine options, parameters, cache visibility, and persisted preferences.
- Updates docs and Docker/runtime configuration to support LuxTTS dependencies and cache directories.
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 29 comments.
Show a summary per file
| File | Description |
|---|---|
| voice_clone_studio.py | Passes LuxTTS defaults into shared state and minor formatting/cleanup. |
| requirements.txt | Adds LuxTTS-related Python dependencies (plus some commented guidance). |
| modules/core_components/ui_components/init.py | Adds LuxTTS advanced-parameter UI components (device/threads, etc.). |
| modules/core_components/tools/voice_clone.py | Adds LuxTTS engine selection, LuxTTS advanced controls, and transcript requirement only for Qwen. |
| modules/core_components/tools/conversation.py | Adds LuxTTS as a conversation engine path with LuxTTS param UI and sequential generation. |
| modules/core_components/tools/prep_audio.py | Extends cache cleanup logic to include LuxTTS cache files and in-memory cache clearing. |
| modules/core_components/tools/settings.py | Adds LuxTTS runtime settings and engine toggles; updates model download options. |
| modules/core_components/tools/init.py | Extends shared-state/default config, adds LuxTTS cache helpers, and engine/constants wiring. |
| modules/core_components/help_page.py | Updates help text to mention LuxTTS and caching behavior. |
| modules/core_components/constants.py | Adds LuxTTS engine metadata and defaults/constants. |
| modules/core_components/ai_models/tts_manager.py | Implements LuxTTS loading, prompt caching, and LuxTTS voice-clone generation method. |
| modules/core_components/init.py | Re-exports LuxTTS defaults. |
| docs/updates.md | Notes LuxTTS addition in version history. |
| docker-compose.yaml | Adds .env support, TORCH_HOME, cache mounts, and changes build target to app. |
| README.md | Updates branding/feature list to include LuxTTS. |
| Dockerfile | Adds LuxTTS dependencies and introduces app stage used by compose. |
| .gitignore | Ignores .cache/ and .env. |
| .env-example | Adds example env vars for HF/Torch cache paths. |
Comments suppressed due to low confidence (1)
modules/core_components/constants.py:258
LUXTTS_GENERATION_DEFAULTSduplicatesLUXTTS_DEFAULTS(defined just above) and also uses different key names (e.g.,cpu_threadsvsthreads). Consolidate to a single LuxTTS defaults dict to avoid configuration drift and confusion over which one is authoritative.
# LuxTTS Generation Defaults
LUXTTS_GENERATION_DEFAULTS = {
"num_steps": 4,
"t_shift": 0.9,
"speed": 1.0,
| lux_keys = [ | ||
| k | ||
| for k in self._luxtts_prompt_cache.keys() | ||
| if k.startswith(f"{sample_name}_") |
There was a problem hiding this comment.
clear_prompt_cache_for_sample() won’t clear LuxTTS entries because _luxtts_prompt_cache is keyed by sample_name, but the code filters keys with startswith(f"{sample_name}_"). Update the predicate to match the actual key format (e.g., exact match) so LuxTTS in-memory prompts are cleared when samples are edited/deleted.
| if k.startswith(f"{sample_name}_") | |
| if k == sample_name |
| "--- LuxTTS ---", | ||
| "LuxTTS", | ||
| "--- VibeVoice ASR ---", | ||
| "VibeVoice-ASR", | ||
| "--- LuxTTS ---", |
There was a problem hiding this comment.
The model download dropdown includes LuxTTS twice (a LuxTTS section at lines 136–137 and then another LuxTTS section at line 140+). This creates duplicate UI entries; keep a single LuxTTS header + option.
| "VibeVoice-Large": "FranckyB/VibeVoice-Large", | ||
| "LuxTTS": "YatharthS/LuxTTS", | ||
| "VibeVoice-ASR": "microsoft/VibeVoice-ASR", | ||
| "LuxTTS": "YatharthS/LuxTTS", | ||
| } |
There was a problem hiding this comment.
MODEL_ID_MAP repeats the "LuxTTS" key. In Python dicts the latter entry wins, so the earlier one is redundant and makes the mapping harder to maintain. Remove the duplicate key entry.
| tool_class = None | ||
| for attr_name in dir(tool_module): | ||
| attr = getattr(tool_module, attr_name) | ||
| if isinstance(attr, type) and issubclass(attr, Tool) and attr is not Tool: | ||
| if ( | ||
| isinstance(attr, type) | ||
| and issubclass(attr, Tool) # F821 Undefined name `Tool` | ||
| and attr is not Tool # F821 Undefined name `Tool` | ||
| ): |
There was a problem hiding this comment.
This fallback branch references Tool (and even includes an inline "F821 Undefined name" note), but Tool isn’t imported. Even if this path is currently unused, it’s dead/unsafe code; either import Tool or delete the fallback implementation.
| HF_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\huggingface | ||
| HF_CACHE_CONTAINER_PATH=/home/user/app/.cache/huggingface | ||
| HF_TOKEN=hf_... | ||
|
|
||
| TORCH_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\torch | ||
| TORCH_CACHE_CONTAINER_PATH=/home/user/app/.cache/torch |
There was a problem hiding this comment.
.env-example hardcodes a specific Windows username/path ("C:\users\joshua.bailey\...") which is not portable. Use a generic placeholder path (e.g., "C:\Users\\.cache\huggingface") so the example applies to all users.
| HF_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\huggingface | |
| HF_CACHE_CONTAINER_PATH=/home/user/app/.cache/huggingface | |
| HF_TOKEN=hf_... | |
| TORCH_CACHE_HOST_PATH=C:\users\joshua.bailey\.cache\torch | |
| TORCH_CACHE_CONTAINER_PATH=/home/user/app/.cache/torch | |
| HF_CACHE_HOST_PATH=C:\Users\<your-user>\.cache\huggingface | |
| HF_CACHE_CONTAINER_PATH=/home/<your-user>/app/.cache/huggingface | |
| HF_TOKEN=hf_... | |
| TORCH_CACHE_HOST_PATH=C:\Users\<your-user>\.cache\torch | |
| TORCH_CACHE_CONTAINER_PATH=/home/<your-user>/app/.cache/torch |
| refresh_samples = shared_state["refresh_samples"] | ||
| confirm_trigger = shared_state["confirm_trigger"] | ||
| input_trigger = shared_state["input_trigger"] | ||
|
|
There was a problem hiding this comment.
Variable input_trigger is not used.
| components["confirm_trigger"] = confirm_trigger | |
| components["input_trigger"] = input_trigger |
| get_sample_choices = shared_state["get_sample_choices"] | ||
| get_available_samples = shared_state["get_available_samples"] | ||
| load_sample_details = shared_state["load_sample_details"] | ||
| get_prompt_cache_path = shared_state["get_prompt_cache_path"] |
There was a problem hiding this comment.
Variable get_prompt_cache_path is not used.
| get_prompt_cache_path = shared_state["get_prompt_cache_path"] |
| load_sample_details = shared_state["load_sample_details"] | ||
| get_prompt_cache_path = shared_state["get_prompt_cache_path"] | ||
| get_or_create_voice_prompt = shared_state["get_or_create_voice_prompt"] | ||
| refresh_samples = shared_state["refresh_samples"] |
There was a problem hiding this comment.
Variable refresh_samples is not used.
| refresh_samples = shared_state["refresh_samples"] |
| "get_configured_dir", | ||
| "load_config", | ||
| "save_config", | ||
| "save_preference", |
There was a problem hiding this comment.
The name 'save_preference' is exported by all but is not defined.
| processed_lines.append(line) | ||
| else: | ||
| # Add default [1]: label | ||
| processed_lines.append(f"[1]: {line}") | ||
| return '\n'.join(processed_lines) | ||
| return "\n".join(processed_lines) | ||
|
|
||
| def extract_style_instructions(text): | ||
| """Extract style instructions from parentheses.""" | ||
| import re |
There was a problem hiding this comment.
This import of module re is redundant, as it was previously imported on line 20.
Sorry about the formatting changes, I think vscode was running black on modified files but I didn't do it on purpose. The work in this PR is an agent trying to reconcile my original luxtts branch with your dev branch, and later, doing merge conflict resolution because you had implemented luxtts while the agent was working 😅 I was trying to catch my diverged luxtts branch back up to dev. Since I wasn't sure these changes would still be useful, this [draft] pr was more of a reference. I think it adds [and hopefully improves] a few things but it looks like copilot caught some weird merge artifacts. I tested the changes only in docker, but it worked well. |
|
I'll close this PR for now. |
Agreed. 😀 Sorry for the confusion; the intent was more to share the code, not so much to merge it at this time. I will extract specific differences and bring them to your attention as appropriate. |
|
Here is the short version of what is different about this PR versus the LuxTTS already in upstream-dev. Upstream-dev already ships LuxTTS (manager + tools + 48 kHz + transcript-based prompt caching). This PR does not re-add that. The actual deltas are:
So: the PR is mainly about LuxTTS prompt handling (audio-only), runtime controls, cache safety, and container readiness. The response above was written by the same agent that did the merge conflict resolution. Putting it here just for consideration and reference, but again, this draft PR was not meant to be merged as-is. |
LuxTTS
A different take on LuxTTS implementation, for consideration and reference
Summary
Key Changes
Core LuxTTS Integration
Tool UI And Workflow Updates
enabled_engines, exposes LuxTTS advanced parameters (including device and CPU threads), persists LuxTTS preferences, and requires transcripts only for Qwen in modules/core_components/tools/voice_clone.py..ptnaming in modules/core_components/tools/init.py.Docs And Help Text
Docker/Runtime And Dependencies
Notable Behavior Details
<sample>_luxtts.ptwith parameters embedded in the cache metadata; cache validity is checked against audio hash,rms, andref_duration.encode_prompt(no transcript required), while Qwen still requires transcripts for prompt caching.Behavior Change Note (Upstream Alignment)
Issues/Concerns To Evaluate
SAMPLE_RATEconsistency.