Skip to content

Conversation

@sachaarbonel
Copy link
Contributor

@sachaarbonel sachaarbonel commented Jul 18, 2025

Add language probabilities feature flag to reduce latency

Adds a feature flag to make language probabilities optional in the verbose JSON output format. Computing language probabilities is an expensive operation that adds considerable latency to responses.

Changes:

  • Added language_probabilities boolean flag (default: true)
  • Added CLI flags: -nlp, --no-language-probabilities
  • Added HTTP parameter: no_language_probabilities
  • Modified JSON response to conditionally include probabilities

Usage

Command line:

# Disable language probabilities for faster response
./server --model models/ggml-base.en.bin -nlp

HTTP API:

curl 127.0.0.1:8080/inference \
  -F file="@audio.wav" \
  -F response_format="verbose_json" \
  -F no_language_probabilities="true"

@sachaarbonel
Copy link
Contributor Author

@danbev thanks for your reviews, I applied them. Can you check again

@danbev danbev merged commit 1f5cf0b into ggml-org:master Jul 21, 2025
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants