Skip to content

Commit 81d5088

Browse files
committed
docs: anti-AI pass on README
1 parent efe6d1e commit 81d5088

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Distributed LLM inference on Apple Silicon. Pipeline parallelism across dual Mac
1111

1212
## Performance
1313

14-
Real numbers from dual Mac Mini M4 (16 GB each). No benchmarks cherry-picked.
14+
Measured on dual Mac Mini M4 (16 GB each).
1515

1616
| Model | Mode | Hardware | tok/s | Notes |
1717
|-------|------|----------|-------|-------|
@@ -76,7 +76,7 @@ Start the API server:
7676
python hippo_api.py --config hippo.conf.yaml
7777
```
7878

79-
Three endpoints, drop-in replacement for OpenAI SDK:
79+
Three endpoints, same shapes as OpenAI:
8080

8181
```bash
8282
# List models
@@ -91,7 +91,7 @@ curl http://localhost:8002/v1/chat/completions \
9191
curl http://localhost:8002/health
9292
```
9393

94-
Works with **Cursor**, **Open WebUI**, **Continue**, and any OpenAI SDK client — just change `base_url`.
94+
Works with Cursor, Open WebUI, Continuechange `base_url` and it just works.
9595

9696
### Web UI
9797

@@ -131,7 +131,7 @@ models:
131131
r1_layers: [25, 47]
132132
```
133133
134-
Memory guard built in — refuses to load if estimated usage exceeds `RAM × safety_factor`.
134+
Memory guard built in — refuses to load if it won't fit (`RAM × safety_factor`).
135135

136136
## Architecture
137137

@@ -148,7 +148,7 @@ R0 (Mac Mini 1) R1 (Mac Mini 2)
148148
149149
### Why SD doesn't help Pipeline
150150
151-
Counter-intuitive but实测verified: speculative decoding (including DFlash) does **not** accelerate pipeline inference. The bottleneck is R0's forward pass (~100ms/step). SD saves time on sampling, but verification also requires R0 forward — so SD doesn't reduce forward passes. Net result: slower than baseline (4.3 tok/s vs 6.8 tok/s).
151+
Counter-intuitive but实测 verified: speculative decoding (including DFlash) does **not** accelerate pipeline inference. The bottleneck is R0's forward pass (~100ms/step). SD saves time on sampling, but verification also requires R0 forward — so SD doesn't reduce forward passes. Net result: slower than baseline (4.3 tok/s vs 6.8 tok/s).
152152
153153
> Pipeline solves the **memory** problem. SD solves the **speed** problem. They're orthogonal.
154154

0 commit comments

Comments
 (0)