Skip to content

Conversation

ilopezluna
Copy link
Contributor

@ilopezluna ilopezluna commented Apr 26, 2025

Introducing Model Cards CLI Tool and Model Documentation Updates

This PR introduces a new Model Cards CLI tool and updates model documentation across the repository. Key changes include:

  1. New Model Cards CLI Tool:

    • Command-line interface for updating model card markdown files
    • Model repository inspection capabilities
    • OCI registry integration for model metadata
    • GGUF file metadata extraction
    • Markdown file processing utilities
  2. Model Documentation Updates:

    • Updated model variant information across multiple model cards
    • Improved accuracy of parameters, quantization options, and VRAM estimates
    • Added new model variants and options
    • Enhanced clarity in model documentation
model-cards-cli % make help
Available targets:
  all              - Clean, build, and test
  build            - Build the binary
  clean            - Clean build artifacts
  lint             - Run linters
  run              - Run the binary to update all model files
  run-single       - Run the binary to update a single model file (Usage: make run-single MODEL=<model-file.md>)
  inspect          - Inspect a model repository (Usage: make inspect REPO=<repository> [TAG=<tag>] [OPTIONS=<options>])
                     Example: make inspect REPO=ai/smollm2
                     Example: make inspect REPO=ai/smollm2 TAG=360M-Q4_K_M
                     Example: make inspect REPO=ai/smollm2 OPTIONS="--parameters --vram --json"
  help             - Show this help message
   
make inspect REPO=ai/llama3.2 TAG=latest
Inspecting model: ai/llama3.2:latest
INFO[2025-05-02 17:18:01] Starting model inspector                     
INFO[2025-05-02 17:18:01] Inspecting ai/llama3.2:latest                
🔍 Model: ai/llama3.2:latest
   • Parameters   : 3B
   • Architecture : llama
   • Quantization : IQ2_XXS/Q4_K_M
   • Size         : 1.87 GiB
   • Context      : 131072 tokens
   • VRAM         : 4.08 GB
INFO[2025-05-02 17:18:04] Inspection completed successfully     

@ilopezluna
Copy link
Contributor Author

Context window (context length) seems to be part of the gguf metadata: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#llm
I'm going to check if its contained in the gguf we have in Hub and if so I will include it as metadata in config file

|------------------------------|------------|--------------|----------------|--------|--------|
| `deepcoder-preview:14B-F16` | 14.77B | F16 | 131,072 | 24GB¹ | 29.5GB |
| `deepcoder-preview:14B:latest` <br><br> `deepcoder-preview:14B-Q4_K_M` | 14.77B | Q4_K_M | 131,072 | 8GB¹ | 9GB |
| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Model Variant | Parameters | Quantization | Context window | VRAM | Size |
| Model variant | Parameters | Quantization | Context window | VRAM | Size |

Picky nit. Can we have all them to be changed to sentence case like this please? TIA

* Renaming readme files for each model to the same name used in Hub

* Fix smollm2 urls
@krissetto
Copy link
Contributor

krissetto commented Apr 28, 2025

I didn't find how we add context window and vram, how can we automatize that?

@ilopezluna Context length is generally model specific and should be given by the model creators, I'm not sure if there's an easy way to automate that if the metadata is not included in the HF repo consistently. Also, we should be aware of the context length limitations we currently have in DMR (I'm not sure if any progress has been made there).. maybe we should specify that instead of just removing all values? if not, we could also remove the column in the table instead of just leaving it empty

@ilopezluna
Copy link
Contributor Author

I didn't find how we add context window and vram, how can we automatize that?

@ilopezluna Context length is generally model specific and should be given by the model creators, I'm not sure if there's an easy way to automate that if the metadata is not included in the HF repo consistently. Also, we should be aware of the context length limitations we currently have in DMR (I'm not sure if any progress has been made there).. maybe we should specify that instead of just removing all values? if not, we could also remove the column in the table instead of just leaving it empty

@krissetto I've just verified (thanks @jalonsogo for the hint) its included in the GGUF metadata as [llm].context_length
I'm going to discuss with the team to include it in the config file.
I will update the current script to also look into this metadata to include it in the table.

I'm not sure if any progress has been made there
Unfortunately there is no progress here yet

@ilopezluna
Copy link
Contributor Author

I didn't find how we add context window and vram, how can we automatize that?

@ilopezluna Context length is generally model specific and should be given by the model creators, I'm not sure if there's an easy way to automate that if the metadata is not included in the HF repo consistently. Also, we should be aware of the context length limitations we currently have in DMR (I'm not sure if any progress has been made there).. maybe we should specify that instead of just removing all values? if not, we could also remove the column in the table instead of just leaving it empty

@krissetto I've just verified (thanks @jalonsogo for the hint) its included in the GGUF metadata as [llm].context_length I'm going to discuss with the team to include it in the config file. I will update the current script to also look into this metadata to include it in the table.

I'm not sure if any progress has been made there
Unfortunately there is no progress here yet

@krissetto @jalonsogo I'm using this formula now: https://github.com/docker/model-cards/pull/18/files#diff-3ddaf77e1aeb6813c77ff54404fc4be8e4aa5bbff4bd6227bbea8d04155d4468R216

@ilopezluna
Copy link
Contributor Author

@jalonsogo I kept the previous scripts but I think it would be better to remove them once we confirm that current go approach works as expected

@krissetto
Copy link
Contributor

@ilopezluna noice 🫶

nit: don't forget the footnote notation (the little "1") in the VRAM calc parts of the tables when we generate them

@krissetto
Copy link
Contributor

@ilopezluna noice 🫶

nit: don't forget the footnote notation (the little "1") in the VRAM calc parts of the tables when we generate them

or maybe lets put it in the table header itself? 🤔

@ilopezluna
Copy link
Contributor Author

ilopezluna commented May 2, 2025

@ilopezluna noice 🫶

nit: don't forget the footnote notation (the little "1") in the VRAM calc parts of the tables when we generate them

good catch, thanks! (added)

@ilopezluna ilopezluna merged commit e775016 into rename May 6, 2025
@ilopezluna ilopezluna mentioned this pull request May 6, 2025
ilopezluna added a commit that referenced this pull request May 6, 2025
* Renaming readme files for each model to the same name used in Hub

* Fix smollm2 urls

* Update overviews (#18)

* adds update script

* adds build-model-table.sh script

* Updates all models

* force param is not needed anymore

* Renaming model overviews to match with the model name in Hub (#17)

* Renaming readme files for each model to the same name used in Hub

* Fix smollm2 urls

* Use sentence case

* Adds initial go script to update table

* - build-all tables script to Go
- Parse gguf without downloading it

* - Uses authenticated req (to avoid rate limit)
- Fixes update of the markdown

* Try to get labels from general.size_label first, if not found fallback parameters metadata

* Format context length

* VRAM estimation

* Allow to update only the specified file

* Removes unneeded scripts

* Fix estimated VRAM for embedding model

* Adds model inspect command

* Rename to model-cards-cli

* Updates model-cards

* Rename header to VRAM¹

* Adds parsed gguf file into ModelVariant, and includes method to extract all metadata

* Includes gguf metadata into inspect

* No need to use interface for registry client for now.

* A ModelVariant has multiple tags

* Formats VRAM

* Formats context length

* Adds --all to include metadata

* Removes formatter

* Format size

* Update models

* Script not needed anymore

* Updates README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants