Skip to content

Commit bfa212c

Browse files
authored
update-overviews (#20)
* Renaming readme files for each model to the same name used in Hub * Fix smollm2 urls * Update overviews (#18) * adds update script * adds build-model-table.sh script * Updates all models * force param is not needed anymore * Renaming model overviews to match with the model name in Hub (#17) * Renaming readme files for each model to the same name used in Hub * Fix smollm2 urls * Use sentence case * Adds initial go script to update table * - build-all tables script to Go - Parse gguf without downloading it * - Uses authenticated req (to avoid rate limit) - Fixes update of the markdown * Try to get labels from general.size_label first, if not found fallback parameters metadata * Format context length * VRAM estimation * Allow to update only the specified file * Removes unneeded scripts * Fix estimated VRAM for embedding model * Adds model inspect command * Rename to model-cards-cli * Updates model-cards * Rename header to VRAM¹ * Adds parsed gguf file into ModelVariant, and includes method to extract all metadata * Includes gguf metadata into inspect * No need to use interface for registry client for now. * A ModelVariant has multiple tags * Formats VRAM * Formats context length * Adds --all to include metadata * Removes formatter * Format size * Update models * Script not needed anymore * Updates README.md
1 parent 5e6d1ae commit bfa212c

31 files changed

+1613
-96
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
.idea
2-
.DS_Store
2+
.DS_Store
3+
4+
bin

README.md

Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Distilled LLaMA by DeepSeek, fast and optimized for real-world tasks.
2424
![Gemma Logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])
2525

2626
📌 **Description:**
27-
Googles latest Gemma, small yet strong for chat and generation
27+
Google's latest Gemma, small yet strong for chat and generation
2828

2929
📂 **Model File:** [`ai/gemma3.md`](ai/gemma3.md)
3030

@@ -37,7 +37,7 @@ Google’s latest Gemma, small yet strong for chat and generation
3737
![Meta Logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])
3838

3939
📌 **Description:**
40-
Metas LLaMA 3.1: Chat-focused, benchmark-strong, multilingual-ready.
40+
Meta's LLaMA 3.1: Chat-focused, benchmark-strong, multilingual-ready.
4141

4242
📂 **Model File:** [`ai/llama3.1.md`](ai/llama3.1.md)
4343

@@ -111,7 +111,7 @@ A state-of-the-art English language embedding model developed by Mixedbread AI.
111111
![Microsoft Logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])
112112

113113
📌 **Description:**
114-
Microsofts compact model, surprisingly capable at reasoning and code.
114+
Microsoft's compact model, surprisingly capable at reasoning and code.
115115

116116
📂 **Model File:** [`ai/phi4.md`](ai/phi4.md)
117117

@@ -152,11 +152,49 @@ Experimental Qwen variant—lean, fast, and a bit mysterious.
152152
📌 **Description:**
153153
A compact language model, designed to run efficiently on-device while performing a wide range of language tasks
154154

155-
📂 **Model File:** [`ai/smolllm2.md`](ai/smollm2.md)
155+
📂 **Model File:** [`ai/smollm2.md`](ai/smollm2.md)
156156

157157
**URLs:**
158158
- https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct
159159
- https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
160160

161161
---
162162

163+
## 🔧 CLI Usage
164+
165+
The model-cards-cli tool provides commands to inspect and update model information:
166+
167+
### Inspect Command
168+
```bash
169+
# Basic inspection
170+
make inspect REPOSITORY=ai/smollm2
171+
172+
# Inspect specific tag
173+
make inspect REPOSITORY=ai/smollm2 TAG=360M-Q4_K_M
174+
175+
# Show all metadata
176+
make inspect REPOSITORY=ai/smollm2 OPTIONS="--all"
177+
```
178+
179+
### Update Command
180+
```bash
181+
# Update all models
182+
make run
183+
184+
# Update specific model
185+
make run-single MODEL=ai/smollm2.md
186+
```
187+
188+
### Available Options
189+
190+
#### Inspect Command Options
191+
- `REPOSITORY`: (Required) The repository to inspect (e.g., `ai/smollm2`)
192+
- `TAG`: (Optional) Specific tag to inspect (e.g., `360M-Q4_K_M`)
193+
- `OPTIONS`: (Optional) Additional options:
194+
- `--all`: Show all metadata fields
195+
- `--log-level`: Set log level (debug, info, warn, error)
196+
197+
#### Update Command Options
198+
- `MODEL`: (Required for run-single) Specific model file to update (e.g., `ai/smollm2.md`)
199+
- `--log-level`: Set log level (debug, info, warn, error)
200+

ai/deepcoder-preview.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,14 @@ DeepCoder-14B is purpose-built for advanced code reasoning, programming task sol
3232

3333
## Available model variants
3434

35-
| Model variant | Parameters | Quantization | Context window | VRAM | Size |
36-
|------------------------------|------------|--------------|----------------|--------|--------|
37-
| `deepcoder-preview:14B-F16` | 14.77B | F16 | 131,072 | 24GB¹ | 29.5GB |
38-
| `deepcoder-preview:14B:latest` <br><br> `deepcoder-preview:14B-Q4_K_M` | 14.77B | Q4_K_M | 131,072 | 8GB¹ | 9GB |
35+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
36+
|---------------|------------|--------------|----------------|------|-------|
37+
| `ai/deepcoder-preview:latest`<br><br>`ai/deepcoder-preview:14B-Q4_K_M` | 14B | IQ2_XXS/Q4_K_M | 131K tokens | 4.03 GB | 8.37 GB |
38+
| `ai/deepcoder-preview:14B-F16` | 14B | F16 | 131K tokens | 31.29 GB | 27.51 GB |
3939

40-
¹: VRAM estimated based on GGUF model characteristics.
40+
¹: VRAM estimated based on model characteristics.
41+
42+
> `latest``14B-Q4_K_M`
4143
4244
## Use this AI model with Docker Model Runner
4345

ai/deepseek-r1-distill-llama.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,15 +33,17 @@ i: Estimated
3333

3434
## Available model variants
3535

36-
| Model Variant | Parameters | Quantization | Context Window | VRAM | Size |
37-
|------------------------------------------------------------------------------------|----------- |----------------|---------------- |--------- |-------|
38-
| `ai/deepseek-r1-distill-llama:70B-Q4_K_M` | 70B | IQ2_XXS/Q4_K_M | 128K tokens | 42GB¹ | 42GB |
39-
| `ai/deepseek-r1-distill-llama:8B-F16` | 8B | F16 | 128K tokens | 19.2GB¹ | 16GB |
40-
| `ai/deepseek-r1-distill-llama:latest`<br><br>`ai/deepseek-r1-distill-llama:8B-Q4_K_M` | 8B | IQ2_XXS/Q4_K_M | 128K tokens | 4.5GB¹ | 5GB |
36+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
37+
|---------------|------------|--------------|----------------|------|-------|
38+
| `ai/deepseek-r1-distill-llama:latest`<br><br>`ai/deepseek-r1-distill-llama:8B-Q4_K_M` | 8B | IQ2_XXS/Q4_K_M | 131K tokens | 2.31 GB | 4.58 GB |
39+
| `ai/deepseek-r1-distill-llama:70B-Q4_0` | 70B | Q4_0 | 131K tokens | 44.00 GB | 37.22 GB |
40+
| `ai/deepseek-r1-distill-llama:70B-Q4_K_M` | 70B | IQ2_XXS/Q4_K_M | 131K tokens | 20.17 GB | 39.59 GB |
41+
| `ai/deepseek-r1-distill-llama:8B-F16` | 8B | F16 | 131K tokens | 17.88 GB | 14.96 GB |
42+
| `ai/deepseek-r1-distill-llama:8B-Q4_0` | 8B | Q4_0 | 131K tokens | 5.03 GB | 4.33 GB |
4143

4244
¹: VRAM estimated based on model characteristics.
4345

44-
> `:latest``70B-Q4_K_M`
46+
> `latest``8B-Q4_K_M`
4547
4648
## Use this AI model with Docker Model Runner
4749

ai/gemma3-qat.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -36,17 +36,16 @@ Gemma 3 4B model can be used for:
3636

3737
## Available model variants
3838

39-
| Model variant | Parameters | Quantization | Context window | VRAM | Size |
40-
|-------------------------------------------------------- |----------- |----------------|--------------- |---------- |------- |
41-
| `ai/gemma3-qat:1B-Q4_K_M` | 1B | IQ2_XXS/Q4_K_M | 32K tokens | 0.892GB¹ | 0.95GB |
42-
| `ai/gemma3-qat:latest`<br><br>`ai/gemma3-qat:4B-Q4_K_M` | 4B | IQ2_XXS/Q4_K_M | 128K tokens | 3.4GB¹ | 2.93GB |
43-
| `ai/gemma3-qat:12B-Q4_K_M` | 12B | IQ2_XXS/Q4_K_M | 128K tokens | 8.7GB¹ | 7.52GB |
44-
| `ai/gemma3-qat:27B-Q4_K_M` | 27B | IQ2_XXS/Q4_K_M | 128K tokens | 21GB¹ | 16GB |
39+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
40+
|---------------|------------|--------------|----------------|------|-------|
41+
| `ai/gemma3-qat:latest`<br><br>`ai/gemma3-qat:4B-Q4_K_M` | 3.88 B | Q4_0 | 131K tokens | 5.44 GB | 2.93 GB |
42+
| `ai/gemma3-qat:1B-Q4_K_M` | 999.89 M | Q4_0 | 33K tokens | 5.02 GB | 950.82 MB |
43+
| `ai/gemma3-qat:27B-Q4_K_M` | 27.01 B | Q4_0 | 131K tokens | 20.28 GB | 16.04 GB |
44+
| `ai/gemma3-qat:12B-Q4_K_M` | 11.77 B | Q4_0 | 131K tokens | 9.80 GB | 7.51 GB |
4545

46-
¹: VRAM extracted from Gemma documentation ([link](https://ai.google.dev/gemma/docs/core#128k-context)).
47-
These are rough estimations. QAT models should use much less memory compared to the standard Gemma3 models
46+
¹: VRAM estimated based on model characteristics.
4847

49-
> `:latest``4B-Q4_K_M`
48+
> `latest``4B-Q4_K_M`
5049
5150
## Use this AI model with Docker Model Runner
5251

ai/gemma3.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -30,16 +30,17 @@ Gemma 3 4B model can be used for:
3030

3131
## Available model variants
3232

33-
| Model Variant | Parameters | Quantization | Context Window | VRAM | Size |
34-
|-------------------------------------------------|----------- |----------------|--------------- |---------- |------- |
35-
| `ai/gemma3:1B-F16` | 1B | F16 | 32K tokens | 1.5GB¹ | 1.86GB |
36-
| `ai/gemma3:1B-Q4_K_M` | 1B | IQ2_XXS/Q4_K_M | 32K tokens | 0.892GB¹ | 0.76GB |
37-
| `ai/gemma3:4B-F16` | 4B | F16 | 128K tokens | 6.4GB¹ | 7.23GB |
38-
| `ai/gemma3:latest`<br><br>`ai/gemma3:4B-Q4_K_M` | 4B | IQ2_XXS/Q4_K_M | 128K tokens | 3.4GB¹ | 2.31GB |
33+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
34+
|---------------|------------|--------------|----------------|------|-------|
35+
| `ai/gemma3:latest`<br><br>`ai/gemma3:4B-Q4_K_M` | 4B | IQ2_XXS/Q4_K_M | 131K tokens | 4.15 GB | 2.31 GB |
36+
| `ai/gemma3:4B-F16` | 4B | F16 | 131K tokens | 11.94 GB | 7.23 GB |
37+
| `ai/gemma3:4B-Q4_0` | 4B | Q4_0 | 131K tokens | 5.51 GB | 2.19 GB |
38+
| `ai/gemma3:1B-F16` | 1B | F16 | 33K tokens | 6.62 GB | 1.86 GB |
39+
| `ai/gemma3:1B-Q4_K_M` | 1B | IQ2_XXS/Q4_K_M | 33K tokens | 4.68 GB | 762.49 MB |
3940

40-
¹: VRAM extracted from Gemma documentation ([link](https://ai.google.dev/gemma/docs/core#128k-context))
41+
¹: VRAM estimated based on model characteristics.
4142

42-
`:latest``4B-Q4_K_M`
43+
> `latest` `4B-Q4_K_M`
4344
4445
## Use this AI model with Docker Model Runner
4546

ai/llama3.1.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,14 @@
3131

3232
## Available model variants
3333

34-
| Model variant | Parameters | Quantization | Context window | VRAM | Size |
35-
|----------------------------------------------------- |----------- |--------------- |--------------- |---------- |------- |
36-
| `ai/llama3.1:latest`<br><br>`ai/llama3.1:8B-Q4_K_M` | 8B | Q4_K_M | 128K | 4.8GB¹ | 5GB |
37-
| `ai/llama3.1:8B-F16` | 8B | F16 | 128K | 19.2GB¹ | 16GB |
34+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
35+
|---------------|------------|--------------|----------------|------|-------|
36+
| `ai/llama3.1:latest`<br><br>`ai/llama3.1:8B-Q4_K_M` | 8B | IQ2_XXS/Q4_K_M | 131K tokens | 2.31 GB | 4.58 GB |
37+
| `ai/llama3.1:8B-F16` | 8B | F16 | 131K tokens | 17.88 GB | 14.96 GB |
3838

39-
¹: VRAM estimates based on model characteristics.
39+
¹: VRAM estimated based on model characteristics.
4040

41-
> `:latest``8B-Q4_K_M`
41+
> `latest``8B-Q4_K_M`
4242
4343
## Use this AI model with Docker Model Runner
4444

ai/llama3.2.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,18 @@ Llama 3.2 instruct models are designed for:
2929

3030
## Available model variants
3131

32-
| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
33-
|---------------------------------------------------- |------------|--------------|----------------|--------|-------|
34-
| `ai/llama3.2:3B-F16` | 3B | F16 | 128k tokens | 7.2GB¹ | 6GB |
35-
| `ai/llama3.2:latest`<br><br>`ai/llama3.2:3B-Q4_K_M` | 3B | Q4_K_M | 128K tokens | 1.8GB¹ | 1.8GB |
36-
| `ai/llama3.2:1B-F16` | 1B | F16 | 128K tokens | 2.4GB¹ | 2.3GB |
37-
| `ai/llama3.2:1B-Q8_0` | 1B | Q8_0 | 128K tokens | 1.2GB¹ | 1.2GB |
32+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
33+
|---------------|------------|--------------|----------------|------|-------|
34+
| `ai/llama3.2:latest`<br><br>`ai/llama3.2:3B-Q4_K_M` | 3B | IQ2_XXS/Q4_K_M | 131K tokens | 3.26 GB | 1.87 GB |
35+
| `ai/llama3.2:1B-Q8_0` | 1B | Q8_0 | 131K tokens | 1.19 GB | 1.22 GB |
36+
| `ai/llama3.2:3B-F16` | 3B | F16 | 131K tokens | 9.11 GB | 5.98 GB |
37+
| `ai/llama3.2:3B-Q4_0` | 3B | Q4_0 | 131K tokens | 4.29 GB | 1.78 GB |
38+
| `ai/llama3.2:1B-F16` | 1B | F16 | 131K tokens | 2.24 GB | 2.30 GB |
39+
| `ai/llama3.2:1B-Q4_0` | 1B | Q4_0 | 131K tokens | 0.63 GB | 727.75 MB |
3840

3941
¹: VRAM estimated based on model characteristics.
4042

41-
> `:latest``3B-Q4_K_M`
43+
> `latest``3B-Q4_K_M`
4244
4345
## Use this AI model with Docker Model Runner
4446

ai/llama3.3.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,14 @@ Meta Llama 3.3 is a powerful 70B parameter multilingual language model designed
3333

3434
## Available model variants
3535

36-
| Model variant | Parameters | Quantization | Context window | VRAM | Size |
37-
|----------------------------------------------------- |----------- |--------------- |--------------- |---------- |------- |
38-
| `ai/llama3.3:latest`<br><br>`ai/llama3.3:70B-Q4_K_M` | 70B | Q4_K_M | 128K | 42GB¹ | 42.5GB |
36+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
37+
|---------------|------------|--------------|----------------|------|-------|
38+
| `ai/llama3.3:latest`<br><br>`ai/llama3.3:70B-Q4_K_M` | 70B | IQ2_XXS/Q4_K_M | 131K tokens | 20.17 GB | 39.59 GB |
39+
| `ai/llama3.3:70B-Q4_0` | 70B | Q4_0 | 131K tokens | 44.00 GB | 37.22 GB |
3940

40-
¹: VRAM estimates based on model characteristics.
41+
¹: VRAM estimated based on model characteristics.
4142

42-
> `:latest``70B-Q4_K_M`
43+
> `latest``70B-Q4_K_M`
4344
4445
## Use this AI model with Docker Model Runner
4546

ai/mistral-nemo.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@ Mistral-Nemo-Instruct-2407 is designed for instruction-following tasks and multi
2828

2929
## Available model variants
3030

31-
| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
32-
|--------------------------------------------------------------|------------|--------------|----------------|--------|-------|
33-
| `ai/mistral-nemo:latest`<br><br>`ai/mistral-nemo:12B-Q4_K_M` | 12B | Q4_K_M | 128k tokens | 7GB¹ | 7.1 GB|
31+
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
32+
|---------------|------------|--------------|----------------|------|-------|
33+
| `ai/mistral-nemo:latest`<br><br>`ai/mistral-nemo:12B-Q4_K_M` | 12B | IQ2_XXS/Q4_K_M | 131K tokens | 3.46 GB | 6.96 GB |
3434

3535
¹: VRAM estimated based on model characteristics.
3636

37-
> `:latest``12B-Q4_K_M`
37+
> `latest``12B-Q4_K_M`
3838
3939
## Use this AI model with Docker Model Runner
4040

0 commit comments

Comments
 (0)