LLM-as-a-Judge: Categorical and Boolean Scores #4965

jannikmaierhoefer · 2025-01-10T11:00:49Z

jannikmaierhoefer
Jan 10, 2025
Maintainer

Describe the feature or potential improvement

Currently, LLM-as-a-Judge evaluations in Langfuse can only create numerical scores.

It would be helpful if this feature could also create categorical or boolean scores (e.g. for intent classification).

Additional information

No response

tangochris · 2025-02-10T20:40:10Z

tangochris
Feb 10, 2025

Yes. As an example use case, in our legal information chatbot, we would like to use LLM-as-judge to categorize (or "score") a trace into one of these topics:

[
"Business & non-profits",
"Cars & getting around",
"Consumer",
"Crime",
"Families & children",
"Health",
"Home & neighbours",
"Money & debt",
"Plan for your future care",
"Resolving disputes",
"Rights & citizenship",
"Wills & estates",
"Work"
]

3 replies

marcklingen Feb 11, 2025
Maintainer

thanks for sharing!

bmeetzejanuary Nov 3, 2025

+1 this - This would allow us to categorize contact types more easily, and I believe allow us to run different evaluations on different categorizations automatically. Given this has been on the roadmap since January, is there a possible timing update?

Qanpi Nov 11, 2025

Agreed. This doesn't seem like a huge burden since score configs for categorical/boolean already exist, so it would be a question of supporting that as structured output from the LLMJ model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

LLM-as-a-Judge: Categorical and Boolean Scores #4965

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

LLM-as-a-Judge: Categorical and Boolean Scores #4965

Uh oh!

jannikmaierhoefer Jan 10, 2025 Maintainer

Describe the feature or potential improvement

Additional information

Replies: 1 comment · 3 replies

Uh oh!

tangochris Feb 10, 2025

Uh oh!

marcklingen Feb 11, 2025 Maintainer

Uh oh!

bmeetzejanuary Nov 3, 2025

Uh oh!

Qanpi Nov 11, 2025

jannikmaierhoefer
Jan 10, 2025
Maintainer

Replies: 1 comment 3 replies

tangochris
Feb 10, 2025

marcklingen Feb 11, 2025
Maintainer