You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Works with **Cursor**, **Open WebUI**, **Continue**, and any OpenAI SDK client — just change `base_url`.
94
+
Works with Cursor, Open WebUI, Continue — change `base_url` and it just works.
95
95
96
96
### Web UI
97
97
@@ -131,7 +131,7 @@ models:
131
131
r1_layers: [25, 47]
132
132
```
133
133
134
-
Memory guard built in — refuses to load if estimated usage exceeds `RAM × safety_factor`.
134
+
Memory guard built in — refuses to load if it won't fit (`RAM × safety_factor`).
135
135
136
136
## Architecture
137
137
@@ -148,7 +148,7 @@ R0 (Mac Mini 1) R1 (Mac Mini 2)
148
148
149
149
### Why SD doesn't help Pipeline
150
150
151
-
Counter-intuitive but实测verified: speculative decoding (including DFlash) does **not** accelerate pipeline inference. The bottleneck is R0's forward pass (~100ms/step). SD saves time on sampling, but verification also requires R0 forward — so SD doesn't reduce forward passes. Net result: slower than baseline (4.3 tok/s vs 6.8 tok/s).
151
+
Counter-intuitive but实测 verified: speculative decoding (including DFlash) does **not** accelerate pipeline inference. The bottleneck is R0's forward pass (~100ms/step). SD saves time on sampling, but verification also requires R0 forward — so SD doesn't reduce forward passes. Net result: slower than baseline (4.3 tok/s vs 6.8 tok/s).
152
152
153
153
> Pipeline solves the **memory** problem. SD solves the **speed** problem. They're orthogonal.
0 commit comments