Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
c3723e3
adds update script
ilopezluna Apr 26, 2025
3e86f07
adds build-model-table.sh script
ilopezluna Apr 26, 2025
86d7da2
Updates all models
ilopezluna Apr 26, 2025
71d9310
force param is not needed anymore
ilopezluna Apr 26, 2025
2610171
Renaming model overviews to match with the model name in Hub (#17)
ilopezluna Apr 28, 2025
b02ce9f
Merge branch 'main' into update-overviews
ilopezluna Apr 28, 2025
c07716e
Use sentence case
ilopezluna Apr 28, 2025
b0a6209
Adds initial go script to update table
ilopezluna Apr 30, 2025
d7d2ceb
- build-all tables script to Go
ilopezluna May 2, 2025
999a550
- Uses authenticated req (to avoid rate limit)
ilopezluna May 2, 2025
d2d7b55
Try to get labels from general.size_label first, if not found fallbac…
ilopezluna May 2, 2025
05d5a0a
Format context length
ilopezluna May 2, 2025
f9a0f26
VRAM estimation
ilopezluna May 2, 2025
422a910
Allow to update only the specified file
ilopezluna May 2, 2025
71ca927
Removes unneeded scripts
ilopezluna May 2, 2025
5003e10
Fix estimated VRAM for embedding model
ilopezluna May 2, 2025
60bfe1b
Adds model inspect command
ilopezluna May 2, 2025
9e2c7d9
Rename to model-cards-cli
ilopezluna May 2, 2025
05f64b0
Updates model-cards
ilopezluna May 2, 2025
0b3f33a
Rename header to VRAM¹
ilopezluna May 2, 2025
b5609cb
Adds parsed gguf file into ModelVariant, and includes method to extra…
ilopezluna May 5, 2025
96446b3
Includes gguf metadata into inspect
ilopezluna May 5, 2025
cb93c29
No need to use interface for registry client for now.
ilopezluna May 5, 2025
aacfa12
A ModelVariant has multiple tags
ilopezluna May 5, 2025
fca5091
Formats VRAM
ilopezluna May 5, 2025
c40c7e3
Formats context length
ilopezluna May 5, 2025
09c0595
Adds --all to include metadata
ilopezluna May 5, 2025
854ef60
Removes formatter
ilopezluna May 5, 2025
1c607e0
Format size
ilopezluna May 5, 2025
35bd4c1
Update models
ilopezluna May 5, 2025
097a121
Script not needed anymore
ilopezluna May 5, 2025
277331b
Updates README.md
ilopezluna May 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
.idea
.DS_Store
.DS_Store

bin
46 changes: 42 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Distilled LLaMA by DeepSeek, fast and optimized for real-world tasks.
![Gemma Logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])

📌 **Description:**
Googles latest Gemma, small yet strong for chat and generation
Google's latest Gemma, small yet strong for chat and generation

📂 **Model File:** [`ai/gemma3.md`](ai/gemma3.md)

Expand All @@ -37,7 +37,7 @@ Google’s latest Gemma, small yet strong for chat and generation
![Meta Logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])

📌 **Description:**
Metas LLaMA 3.1: Chat-focused, benchmark-strong, multilingual-ready.
Meta's LLaMA 3.1: Chat-focused, benchmark-strong, multilingual-ready.

📂 **Model File:** [`ai/llama3.1.md`](ai/llama3.1.md)

Expand Down Expand Up @@ -111,7 +111,7 @@ A state-of-the-art English language embedding model developed by Mixedbread AI.
![Microsoft Logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])

📌 **Description:**
Microsofts compact model, surprisingly capable at reasoning and code.
Microsoft's compact model, surprisingly capable at reasoning and code.

📂 **Model File:** [`ai/phi4.md`](ai/phi4.md)

Expand Down Expand Up @@ -152,11 +152,49 @@ Experimental Qwen variant—lean, fast, and a bit mysterious.
📌 **Description:**
A compact language model, designed to run efficiently on-device while performing a wide range of language tasks

📂 **Model File:** [`ai/smolllm2.md`](ai/smollm2.md)
📂 **Model File:** [`ai/smollm2.md`](ai/smollm2.md)

**URLs:**
- https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct
- https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct

---

## 🔧 CLI Usage

The model-cards-cli tool provides commands to inspect and update model information:

### Inspect Command
```bash
# Basic inspection
make inspect REPOSITORY=ai/smollm2

# Inspect specific tag
make inspect REPOSITORY=ai/smollm2 TAG=360M-Q4_K_M

# Show all metadata
make inspect REPOSITORY=ai/smollm2 OPTIONS="--all"
```

### Update Command
```bash
# Update all models
make run

# Update specific model
make run-single MODEL=ai/smollm2.md
```

### Available Options

#### Inspect Command Options
- `REPOSITORY`: (Required) The repository to inspect (e.g., `ai/smollm2`)
- `TAG`: (Optional) Specific tag to inspect (e.g., `360M-Q4_K_M`)
- `OPTIONS`: (Optional) Additional options:
- `--all`: Show all metadata fields
- `--log-level`: Set log level (debug, info, warn, error)

#### Update Command Options
- `MODEL`: (Required for run-single) Specific model file to update (e.g., `ai/smollm2.md`)
- `--log-level`: Set log level (debug, info, warn, error)

12 changes: 7 additions & 5 deletions ai/deepcoder-preview.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,14 @@ DeepCoder-14B is purpose-built for advanced code reasoning, programming task sol

## Available model variants

| Model variant | Parameters | Quantization | Context window | VRAM | Size |
|------------------------------|------------|--------------|----------------|--------|--------|
| `deepcoder-preview:14B-F16` | 14.77B | F16 | 131,072 | 24GB¹ | 29.5GB |
| `deepcoder-preview:14B:latest` <br><br> `deepcoder-preview:14B-Q4_K_M` | 14.77B | Q4_K_M | 131,072 | 8GB¹ | 9GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/deepcoder-preview:latest`<br><br>`ai/deepcoder-preview:14B-Q4_K_M` | 14B | IQ2_XXS/Q4_K_M | 131K tokens | 4.03 GB | 8.37 GB |
| `ai/deepcoder-preview:14B-F16` | 14B | F16 | 131K tokens | 31.29 GB | 27.51 GB |

¹: VRAM estimated based on GGUF model characteristics.
¹: VRAM estimated based on model characteristics.

> `latest` → `14B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
14 changes: 8 additions & 6 deletions ai/deepseek-r1-distill-llama.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,17 @@ i: Estimated

## Available model variants

| Model Variant | Parameters | Quantization | Context Window | VRAM | Size |
|------------------------------------------------------------------------------------|----------- |----------------|---------------- |--------- |-------|
| `ai/deepseek-r1-distill-llama:70B-Q4_K_M` | 70B | IQ2_XXS/Q4_K_M | 128K tokens | 42GB¹ | 42GB |
| `ai/deepseek-r1-distill-llama:8B-F16` | 8B | F16 | 128K tokens | 19.2GB¹ | 16GB |
| `ai/deepseek-r1-distill-llama:latest`<br><br>`ai/deepseek-r1-distill-llama:8B-Q4_K_M` | 8B | IQ2_XXS/Q4_K_M | 128K tokens | 4.5GB¹ | 5GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/deepseek-r1-distill-llama:latest`<br><br>`ai/deepseek-r1-distill-llama:8B-Q4_K_M` | 8B | IQ2_XXS/Q4_K_M | 131K tokens | 2.31 GB | 4.58 GB |
| `ai/deepseek-r1-distill-llama:70B-Q4_0` | 70B | Q4_0 | 131K tokens | 44.00 GB | 37.22 GB |
| `ai/deepseek-r1-distill-llama:70B-Q4_K_M` | 70B | IQ2_XXS/Q4_K_M | 131K tokens | 20.17 GB | 39.59 GB |
| `ai/deepseek-r1-distill-llama:8B-F16` | 8B | F16 | 131K tokens | 17.88 GB | 14.96 GB |
| `ai/deepseek-r1-distill-llama:8B-Q4_0` | 8B | Q4_0 | 131K tokens | 5.03 GB | 4.33 GB |

¹: VRAM estimated based on model characteristics.

> `:latest` → `70B-Q4_K_M`
> `latest` → `8B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
17 changes: 8 additions & 9 deletions ai/gemma3-qat.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,16 @@ Gemma 3 4B model can be used for:

## Available model variants

| Model variant | Parameters | Quantization | Context window | VRAM | Size |
|-------------------------------------------------------- |----------- |----------------|--------------- |---------- |------- |
| `ai/gemma3-qat:1B-Q4_K_M` | 1B | IQ2_XXS/Q4_K_M | 32K tokens | 0.892GB¹ | 0.95GB |
| `ai/gemma3-qat:latest`<br><br>`ai/gemma3-qat:4B-Q4_K_M` | 4B | IQ2_XXS/Q4_K_M | 128K tokens | 3.4GB¹ | 2.93GB |
| `ai/gemma3-qat:12B-Q4_K_M` | 12B | IQ2_XXS/Q4_K_M | 128K tokens | 8.7GB¹ | 7.52GB |
| `ai/gemma3-qat:27B-Q4_K_M` | 27B | IQ2_XXS/Q4_K_M | 128K tokens | 21GB¹ | 16GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/gemma3-qat:latest`<br><br>`ai/gemma3-qat:4B-Q4_K_M` | 3.88 B | Q4_0 | 131K tokens | 5.44 GB | 2.93 GB |
| `ai/gemma3-qat:1B-Q4_K_M` | 999.89 M | Q4_0 | 33K tokens | 5.02 GB | 950.82 MB |
| `ai/gemma3-qat:27B-Q4_K_M` | 27.01 B | Q4_0 | 131K tokens | 20.28 GB | 16.04 GB |
| `ai/gemma3-qat:12B-Q4_K_M` | 11.77 B | Q4_0 | 131K tokens | 9.80 GB | 7.51 GB |

¹: VRAM extracted from Gemma documentation ([link](https://ai.google.dev/gemma/docs/core#128k-context)).
These are rough estimations. QAT models should use much less memory compared to the standard Gemma3 models
¹: VRAM estimated based on model characteristics.

> `:latest` → `4B-Q4_K_M`
> `latest` → `4B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
17 changes: 9 additions & 8 deletions ai/gemma3.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,17 @@ Gemma 3 4B model can be used for:

## Available model variants

| Model Variant | Parameters | Quantization | Context Window | VRAM | Size |
|-------------------------------------------------|----------- |----------------|--------------- |---------- |------- |
| `ai/gemma3:1B-F16` | 1B | F16 | 32K tokens | 1.5GB¹ | 1.86GB |
| `ai/gemma3:1B-Q4_K_M` | 1B | IQ2_XXS/Q4_K_M | 32K tokens | 0.892GB¹ | 0.76GB |
| `ai/gemma3:4B-F16` | 4B | F16 | 128K tokens | 6.4GB¹ | 7.23GB |
| `ai/gemma3:latest`<br><br>`ai/gemma3:4B-Q4_K_M` | 4B | IQ2_XXS/Q4_K_M | 128K tokens | 3.4GB¹ | 2.31GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/gemma3:latest`<br><br>`ai/gemma3:4B-Q4_K_M` | 4B | IQ2_XXS/Q4_K_M | 131K tokens | 4.15 GB | 2.31 GB |
| `ai/gemma3:4B-F16` | 4B | F16 | 131K tokens | 11.94 GB | 7.23 GB |
| `ai/gemma3:4B-Q4_0` | 4B | Q4_0 | 131K tokens | 5.51 GB | 2.19 GB |
| `ai/gemma3:1B-F16` | 1B | F16 | 33K tokens | 6.62 GB | 1.86 GB |
| `ai/gemma3:1B-Q4_K_M` | 1B | IQ2_XXS/Q4_K_M | 33K tokens | 4.68 GB | 762.49 MB |

¹: VRAM extracted from Gemma documentation ([link](https://ai.google.dev/gemma/docs/core#128k-context))
¹: VRAM estimated based on model characteristics.

`:latest`→ `4B-Q4_K_M`
> `latest` → `4B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
12 changes: 6 additions & 6 deletions ai/llama3.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@

## Available model variants

| Model variant | Parameters | Quantization | Context window | VRAM | Size |
|----------------------------------------------------- |----------- |--------------- |--------------- |---------- |------- |
| `ai/llama3.1:latest`<br><br>`ai/llama3.1:8B-Q4_K_M` | 8B | Q4_K_M | 128K | 4.8GB¹ | 5GB |
| `ai/llama3.1:8B-F16` | 8B | F16 | 128K | 19.2GB¹ | 16GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/llama3.1:latest`<br><br>`ai/llama3.1:8B-Q4_K_M` | 8B | IQ2_XXS/Q4_K_M | 131K tokens | 2.31 GB | 4.58 GB |
| `ai/llama3.1:8B-F16` | 8B | F16 | 131K tokens | 17.88 GB | 14.96 GB |

¹: VRAM estimates based on model characteristics.
¹: VRAM estimated based on model characteristics.

> `:latest` → `8B-Q4_K_M`
> `latest` → `8B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
16 changes: 9 additions & 7 deletions ai/llama3.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,18 @@ Llama 3.2 instruct models are designed for:

## Available model variants

| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
|---------------------------------------------------- |------------|--------------|----------------|--------|-------|
| `ai/llama3.2:3B-F16` | 3B | F16 | 128k tokens | 7.2GB¹ | 6GB |
| `ai/llama3.2:latest`<br><br>`ai/llama3.2:3B-Q4_K_M` | 3B | Q4_K_M | 128K tokens | 1.8GB¹ | 1.8GB |
| `ai/llama3.2:1B-F16` | 1B | F16 | 128K tokens | 2.4GB¹ | 2.3GB |
| `ai/llama3.2:1B-Q8_0` | 1B | Q8_0 | 128K tokens | 1.2GB¹ | 1.2GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/llama3.2:latest`<br><br>`ai/llama3.2:3B-Q4_K_M` | 3B | IQ2_XXS/Q4_K_M | 131K tokens | 3.26 GB | 1.87 GB |
| `ai/llama3.2:1B-Q8_0` | 1B | Q8_0 | 131K tokens | 1.19 GB | 1.22 GB |
| `ai/llama3.2:3B-F16` | 3B | F16 | 131K tokens | 9.11 GB | 5.98 GB |
| `ai/llama3.2:3B-Q4_0` | 3B | Q4_0 | 131K tokens | 4.29 GB | 1.78 GB |
| `ai/llama3.2:1B-F16` | 1B | F16 | 131K tokens | 2.24 GB | 2.30 GB |
| `ai/llama3.2:1B-Q4_0` | 1B | Q4_0 | 131K tokens | 0.63 GB | 727.75 MB |

¹: VRAM estimated based on model characteristics.

> `:latest` → `3B-Q4_K_M`
> `latest` → `3B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
11 changes: 6 additions & 5 deletions ai/llama3.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,14 @@ Meta Llama 3.3 is a powerful 70B parameter multilingual language model designed

## Available model variants

| Model variant | Parameters | Quantization | Context window | VRAM | Size |
|----------------------------------------------------- |----------- |--------------- |--------------- |---------- |------- |
| `ai/llama3.3:latest`<br><br>`ai/llama3.3:70B-Q4_K_M` | 70B | Q4_K_M | 128K | 42GB¹ | 42.5GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/llama3.3:latest`<br><br>`ai/llama3.3:70B-Q4_K_M` | 70B | IQ2_XXS/Q4_K_M | 131K tokens | 20.17 GB | 39.59 GB |
| `ai/llama3.3:70B-Q4_0` | 70B | Q4_0 | 131K tokens | 44.00 GB | 37.22 GB |

¹: VRAM estimates based on model characteristics.
¹: VRAM estimated based on model characteristics.

> `:latest` → `70B-Q4_K_M`
> `latest` → `70B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
8 changes: 4 additions & 4 deletions ai/mistral-nemo.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ Mistral-Nemo-Instruct-2407 is designed for instruction-following tasks and multi

## Available model variants

| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
|--------------------------------------------------------------|------------|--------------|----------------|--------|-------|
| `ai/mistral-nemo:latest`<br><br>`ai/mistral-nemo:12B-Q4_K_M` | 12B | Q4_K_M | 128k tokens | 7GB¹ | 7.1 GB|
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/mistral-nemo:latest`<br><br>`ai/mistral-nemo:12B-Q4_K_M` | 12B | IQ2_XXS/Q4_K_M | 131K tokens | 3.46 GB | 6.96 GB |

¹: VRAM estimated based on model characteristics.

> `:latest` → `12B-Q4_K_M`
> `latest` → `12B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
13 changes: 7 additions & 6 deletions ai/mistral.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,15 @@ i: Estimated

## Available model variants

| Model variant | Parameters | Quantization | Context window | VRAM | Size |
|----------------------------------------------------|----------- |--------------- |----------------|---------|--------|
| `ai/mistral:latest`<br><br>`ai/mistral:7B-Q4_K_M` | 7B | IQ2_XXS/Q4_K_M | 32K | 4.2B¹ | 4.3GB |
| `ai/mistral:7B-F16` | 7B | F16 | 32K | 16.8¹ | 14.5GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/mistral:latest`<br><br>`ai/mistral:7B-Q4_K_M` | 7B | IQ2_XXS/Q4_K_M | 33K tokens | 2.02 GB | 4.07 GB |
| `ai/mistral:7B-F16` | 7B | F16 | 33K tokens | 15.65 GB | 13.50 GB |
| `ai/mistral:7B-Q4_0` | 7B | Q4_0 | 33K tokens | 4.40 GB | 3.83 GB |

¹: VRAM estimated based on model characteristics and quantization.
¹: VRAM estimated based on model characteristics.

> `:latest` → `7B-Q4_K_M`
> `latest` → `7B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
10 changes: 5 additions & 5 deletions ai/mxbai-embed-large.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ mxbai-embed-large-v1 is designed for generating sentence embeddings suitable for

## Available model variants

| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
|-------------------------------------------------------------- |----------- |--------------- |--------------- |---------- |------- |
| `ai/mxbai-embed-large:latest`<br><br>`ai/mxbai-embed-large:335M-F16` | 335M | F16 | 512 tokens | 0.8GB¹ | 670MB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/mxbai-embed-large:latest`<br><br>`ai/mxbai-embed-large:335M-F16` | 334.09 M | F16 | 512 tokens | 0.80 GB | 638.85 MB |

¹: VRAM estimates based on model characteristics.
¹: VRAM estimated based on model characteristics.

> `:latest` → `mxbai-embed-large:335M-F16`
> `latest` → `335M-F16`

## Use this AI model with Docker Model Runner

Expand Down
13 changes: 7 additions & 6 deletions ai/phi4.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,15 @@ Phi-4 is designed for:

## Available model variants

| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
|----------------------------------------------|----------- |----------------|--------------- |--------- |------- |
| `ai/phi4:14B-F16` | 14B | F16 | 16K tokens | 33.6GB¹ | 29.3GB |
| `ai/phi4:latest`<br><br>`ai/phi4:14B-Q4_K_M` | 14B | IQ2_XXS/Q4_K_M | 16K tokens | 8.4GB¹ | 9.GB |
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/phi4:latest`<br><br>`ai/phi4:14B-Q4_K_M` | 15B | IQ2_XXS/Q4_K_M | 16K tokens | 4.92 GB | 8.43 GB |
| `ai/phi4:14B-F16` | 15B | F16 | 16K tokens | 34.13 GB | 27.31 GB |
| `ai/phi4:14B-Q4_0` | 15B | Q4_0 | 16K tokens | 10.03 GB | 7.80 GB |

¹: VRAM estimates based on model characteristics.
¹: VRAM estimated based on model characteristics.

> `:latest` → `14B-Q4_K_M`
> `latest` → `14B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
25 changes: 13 additions & 12 deletions ai/qwen2.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,19 @@ Qwen2.5-7B-Instruct is designed to assist in various natural language processing

## Available model variants

| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
|--------------------------------------------------|------------|------------------|----------------|----------|--------|
| `ai/qwen2.5:0.5B-F16` | 0.5B | F16 | 32K tokens | ~1.2GB¹ | 0.99GB |
| `ai/qwen2.5:1.5B-F16` | 1.5B | F16 | 32K tokens | ~3.5GB¹ | 3.09GB |
| `ai/qwen2.5:3B-F16` | 3.09B | F16 | 32K tokens | ~7GB¹ | 6.18GB |
| `ai/qwen2.5:3B-Q4_K_M` | 3.09B | IQ2_XXS/Q4_K_M | 32K tokens | ~2.2GB¹ | 1.93GB |
| `ai/qwen2.5:7B-F16` | 7.62B | F16 | 32K tokens | ~16GB¹ | 15.24GB|
| `ai/qwen2.5:7B-Q4_K_M`<br><br>`ai/qwen2.5:latest`| 7.62B | IQ2_XXS/Q4_K_M | 32K tokens | ~4.7GB¹ | 4.68GB |

¹: VRAM estimates based on model characteristics.

> `:latest`→ `7B-Q4_K_M`
| Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
|---------------|------------|--------------|----------------|------|-------|
| `ai/qwen2.5:latest`<br><br>`ai/qwen2.5:7B-Q4_K_M` | 7B | IQ2_XXS/Q4_K_M | 33K tokens | 2.32 GB | 4.36 GB |
| `ai/qwen2.5:0.5B-F16` | 0.5B | F16 | 33K tokens | 4.27 GB | 942.43 MB |
| `ai/qwen2.5:1.5B-F16` | 1.5B | F16 | 33K tokens | 4.85 GB | 2.88 GB |
| `ai/qwen2.5:3B-F16` | 3B | F16 | 33K tokens | 7.91 GB | 5.75 GB |
| `ai/qwen2.5:3B-Q4_K_M` | 3B | IQ2_XXS/Q4_K_M | 33K tokens | 2.06 GB | 1.79 GB |
| `ai/qwen2.5:7B-F16` | 7B | F16 | 33K tokens | 15.95 GB | 14.19 GB |
| `ai/qwen2.5:7B-Q4_0` | 7B | Q4_0 | 33K tokens | 4.70 GB | 4.12 GB |

¹: VRAM estimated based on model characteristics.

> `latest` → `7B-Q4_K_M`

## Use this AI model with Docker Model Runner

Expand Down
Loading