Skip to content

Upgrade transformers to 5.9 and huggingface-hub to 1.16 (#1472)#1506

Draft
dxqb wants to merge 1 commit into
masterfrom
revert-1504-revert-transformers-v5
Draft

Upgrade transformers to 5.9 and huggingface-hub to 1.16 (#1472)#1506
dxqb wants to merge 1 commit into
masterfrom
revert-1504-revert-transformers-v5

Conversation

@dxqb

@dxqb dxqb commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

reopens #1472

This PR needs significantly more work, because transformers have screwed up any existing code that uses CLIP encoders on a non-surface level (such as applying LoRAs).

There are also minor issues to fix for T5 and the Qwen TEs, but the main one is CLIP. This is the projected work, but this could be incomplete:

Background: transformers 5.6 flattened CLIPTextModel — the text_model wrapper submodule is
gone; embeddings, encoder and final_layer_norm sit directly on the model. Checkpoints on disk
keep the old text_model.* key format, and from_pretrained translates via a new central
conversion registry. CLIPTextModelWithProjection still nests a text_model, so only
CLIPTextModel users are affected: SD1.x, SDXL TE1, Flux TE1, HunyuanVideo TE2, Würstchen.
OneTrainer breaks everywhere it bypasses from_pretrained or reaches into the module structure
directly.

1. Loading. HFModelLoaderMixin loads weights manually (for custom quantization and dtype
control) and only knows the v4-era conversion hooks, which v5 removed. It needs to apply the
renamings from transformers' own registry instead (get_model_conversion_mapping +
rename_source_key), with a fallback that keeps the original key when the rename doesn't match
the module — CLIPTextModelWithProjection shares the registry entry but still has the nested
layout. This will also restore old-checkpoint Qwen loading, whose v4 workaround silently died with
the upgrade. Weights must also be re-tied after manual loading: v5 only ties them in
from_pretrained, leaving tied params like T5's shared/embed_tokens on the meta device (the
Chroma failure).

2. diffusers upgrade. Bump the pin to current main to pull in diffusers #13843, which fixes
from_single_file for flattened CLIP — covers single-file loading inside diffusers (used by SDXL
and others).

3. Attribute access. Drop .text_model at the sites where the encoder is a flattened
CLIPTextModel; add a clip_util.text_transformer() helper for the two genuinely polymorphic
sites (encode_clip, and the Würstchen prior, which is a CLIPTextModel for v2 but a
WithProjection for Stable Cascade).

4. LoRA key compatibility. LoRA key names derive from module paths, so they will silently
change (lora_te1.encoder… instead of lora_te1.text_model.encoder…), breaking resume of
existing LoRAs and kohya/ComfyUI-compatible export. Wrap flattened text encoders with a
".text_model" prefix, which reproduces the previous key set exactly — conversion tables then
need no changes.

5. Saving. save_pretrained only writes old-format keys when the model carries the
_weight_conversions it got from from_pretrained; manually built models will write flattened
keys to disk, breaking external consumers of saved checkpoints. The loader must attach the
conversions it applied, so saved models keep the ecosystem-standard key format. The single-file
exporters (convert_sd/sdxl_diffusers_to_ckpt) need the text_model segment re-added to their
output keys.

6. SD1.x single-file loading. This goes through diffusers' legacy converter, which upstream
did not fix. Normalize checkpoint keys to the flattened layout before conversion — making the old
NAI key fix implicit — and build the SD2 text encoder with the fixed modern single-file
implementation, injecting it into the legacy function to bypass its broken hardcoded conversion.
Fix the .ckpt fallback, currently dead due to a missing argument, in passing.

@dxqb dxqb mentioned this pull request Jun 6, 2026
5 tasks
@dxqb dxqb added the preview merged in the preview branch label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview merged in the preview branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant