Upgrade transformers to 5.9 and huggingface-hub to 1.16 (#1472)#1506
Draft
dxqb wants to merge 1 commit into
Draft
Upgrade transformers to 5.9 and huggingface-hub to 1.16 (#1472)#1506dxqb wants to merge 1 commit into
dxqb wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
reopens #1472
This PR needs significantly more work, because transformers have screwed up any existing code that uses CLIP encoders on a non-surface level (such as applying LoRAs).
There are also minor issues to fix for T5 and the Qwen TEs, but the main one is CLIP. This is the projected work, but this could be incomplete:
Background: transformers 5.6 flattened
CLIPTextModel— thetext_modelwrapper submodule isgone;
embeddings,encoderandfinal_layer_normsit directly on the model. Checkpoints on diskkeep the old
text_model.*key format, andfrom_pretrainedtranslates via a new centralconversion registry.
CLIPTextModelWithProjectionstill nests atext_model, so onlyCLIPTextModelusers are affected: SD1.x, SDXL TE1, Flux TE1, HunyuanVideo TE2, Würstchen.OneTrainer breaks everywhere it bypasses
from_pretrainedor reaches into the module structuredirectly.
1. Loading.
HFModelLoaderMixinloads weights manually (for custom quantization and dtypecontrol) and only knows the v4-era conversion hooks, which v5 removed. It needs to apply the
renamings from transformers' own registry instead (
get_model_conversion_mapping+rename_source_key), with a fallback that keeps the original key when the rename doesn't matchthe module —
CLIPTextModelWithProjectionshares the registry entry but still has the nestedlayout. This will also restore old-checkpoint Qwen loading, whose v4 workaround silently died with
the upgrade. Weights must also be re-tied after manual loading: v5 only ties them in
from_pretrained, leaving tied params like T5'sshared/embed_tokenson the meta device (theChroma failure).
2. diffusers upgrade. Bump the pin to current main to pull in diffusers #13843, which fixes
from_single_filefor flattened CLIP — covers single-file loading inside diffusers (used by SDXLand others).
3. Attribute access. Drop
.text_modelat the sites where the encoder is a flattenedCLIPTextModel; add aclip_util.text_transformer()helper for the two genuinely polymorphicsites (
encode_clip, and the Würstchen prior, which is aCLIPTextModelfor v2 but aWithProjectionfor Stable Cascade).4. LoRA key compatibility. LoRA key names derive from module paths, so they will silently
change (
lora_te1.encoder…instead oflora_te1.text_model.encoder…), breaking resume ofexisting LoRAs and kohya/ComfyUI-compatible export. Wrap flattened text encoders with a
".text_model"prefix, which reproduces the previous key set exactly — conversion tables thenneed no changes.
5. Saving.
save_pretrainedonly writes old-format keys when the model carries the_weight_conversionsit got fromfrom_pretrained; manually built models will write flattenedkeys to disk, breaking external consumers of saved checkpoints. The loader must attach the
conversions it applied, so saved models keep the ecosystem-standard key format. The single-file
exporters (
convert_sd/sdxl_diffusers_to_ckpt) need thetext_modelsegment re-added to theiroutput keys.
6. SD1.x single-file loading. This goes through diffusers' legacy converter, which upstream
did not fix. Normalize checkpoint keys to the flattened layout before conversion — making the old
NAI key fix implicit — and build the SD2 text encoder with the fixed modern single-file
implementation, injecting it into the legacy function to bypass its broken hardcoded conversion.
Fix the
.ckptfallback, currently dead due to a missing argument, in passing.