dots.llm1 support

New architecture [dots.llm1](https://huggingface.co/rednote-hilab/dots.llm1.inst)

Might somewhat easy to support if Qwen3 and Deepseek are working [according to this](https://github.com/ggml-org/llama.cpp/issues/14044):

> It looks like it is a mixture of deepseek-3 MoE modules and qwen-3 attention modules:
> 
> https://github.com/huggingface/transformers/blob/ffe12627b4e84489d2ab91dd0ec00614855edc79/src/transformers/models/dots1/modular_dots1.py
> 
> and uses qwen-2 tokeniser:
> 
> https://huggingface.co/rednote-hilab/dots.llm1.inst/blob/main/tokenizer_config.json
> 
> So it's probably just a case of glueing all this together?

There's a working [hf patch](https://github.com/huggingface/transformers/blob/ffe12627b4e84489d2ab91dd0ec00614855edc79/src/transformers/models/dots1/modular_dots1.py) and [llama.cpp patch](https://github.com/Noeda/llama.cpp/commit/16dc0f4cff5be488d3c755440f008f1ad0671574#diff-ec77d8003b92ff283179456d36b8b56abf635e7b1232e70daf16676e8920ccf1R5276) in testing.

People are planning Deepseek distills and other tunes with this model since it has several base checkpoints released + more world knowledge than anything outside 5x+ it's size. At 143B, dots.llm1 tunes could plausibly become the premier EXL3 models for people with dual GPUs who can't run any Deepseek quant but want to run something better than Qwen3 32B.

And even if others don't, I will quantize and release EXL3 quants of this model if a working definition file gets created.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

dots.llm1 support #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

dots.llm1 support #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions