-
-
Notifications
You must be signed in to change notification settings - Fork 39
Description
New architecture dots.llm1
Might somewhat easy to support if Qwen3 and Deepseek are working according to this:
It looks like it is a mixture of deepseek-3 MoE modules and qwen-3 attention modules:
and uses qwen-2 tokeniser:
https://huggingface.co/rednote-hilab/dots.llm1.inst/blob/main/tokenizer_config.json
So it's probably just a case of glueing all this together?
There's a working hf patch and llama.cpp patch in testing.
People are planning Deepseek distills and other tunes with this model since it has several base checkpoints released + more world knowledge than anything outside 5x+ it's size. At 143B, dots.llm1 tunes could plausibly become the premier EXL3 models for people with dual GPUs who can't run any Deepseek quant but want to run something better than Qwen3 32B.
And even if others don't, I will quantize and release EXL3 quants of this model if a working definition file gets created.