The Golden Metric #20

kyegomez · 2023-07-21T17:07:12Z

kyegomez
Jul 21, 2023
Maintainer

We're here to discuss a crucial aspect of our project - the User-Task-Completion-Satisfaction (UTCS) rate. This golden metric is the pulse of Swarms, reflecting our commitment to users and measuring our success.

The UTCS rate gauges how reliably and swiftly Swarms can meet user demands.

But what does it mean to complete a task to the user's satisfaction? It's about quality, speed, and reliability. It's about meeting or exceeding user expectations.

The UTCS rate is a mirror of the user experience. A high UTCS rate means users are getting what they need from Swarms, quickly and reliably. It's also a measure of Swarms' efficiency and effectiveness.

Achieving a 95% UTCS rate is a challenging goal, but it's a goal worth striving for. It's a goal that will drive us to improve, innovate, and deliver the best possible experience for our users.

We're implementing several strategies to reach this goal, including understanding user needs, improving system reliability, optimizing for speed, and iterating and improving.

But we want to hear from you.

How do you think we can better understand user needs?
What steps can we take to improve system reliability?
How can we optimize Swarms for speed without sacrificing quality?
What are some ways we can iterate and improve our system?

Your insights and suggestions are invaluable to us. Let's discuss how we can cultivate our golden metric and make Swarms the best it can be.

Looking forward to your thoughts and ideas!

elder-plinius · 2023-07-21T18:01:29Z

elder-plinius
Jul 21, 2023

-Find a way to get feedback from all skill levels of prompting and technical expertise. A system that works great for one user might be unusable to another.
-"Explain like I'm 5" setup guide for non-coders including things like how to install Docker, how to get OpenAI API, which commands to enter for building image and running it, etc.
-Workspace and agent management could use improvement. Should make it stupid simple to view the filespace of the Swarm, how many agents are initialized and what their roles are, and allow users to change temperature or add custom instructions for each agent (while running Swarms).
-Would be nice to get Clippy-like suggestions from the system while running or at the end of a failed/interrupted run (could be a function of the boss agent). For example, "It seems this agent is attempting to do creative writing but your temperature is very low, leading to highly deterministic outputs. Would you like to raise the temperature of this agent by 0.5?" or "I see you are trying to get a complex output but your prompt is short and could use more detailed instructions. Would you like me to suggest 3 new prompts?"

0 replies

kinthaiofficial · 2026-04-29T00:54:09Z

kinthaiofficial
Apr 29, 2026

The "golden metric" for agent swarms is a genuinely hard problem — you want a single number that captures "this swarm is performing well," but the relevant dimensions (cost efficiency, task completion rate, quality of outputs, latency) don't collapse to one number cleanly.

A few candidate metrics and why they fall short alone:

Task completion rate — easy to game; an agent that only attempts easy tasks has a high rate. Needs to be paired with task difficulty weighting.

Cost per successful output — directionally right but ignores quality variance. A cheap low-quality answer has low cost per output but high cost per useful output.

User satisfaction signal — gold standard for quality but expensive to collect and delayed.

Revenue per agent — the metric we've been focused on in KinthAI. Agents that charge for their services and get repeated clients are producing real value. "Earnings retention rate" (what fraction of clients return) is a proxy for quality.

The metric that we've found most useful: value-to-cost ratio = (quality score × task complexity) / (tokens spent × model tier cost). This is high when you're using the right model for the task (cheap model for simple tasks, expensive model for complex ones) and producing quality outputs.

For swarms specifically, you also want coordination efficiency = useful work done / total compute spent. A swarm where agents spend 40% of their tokens negotiating and only 60% on actual task content has poor coordination efficiency.

More on the economic model for agent performance metrics: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents

What's driving the search for a golden metric — investor reporting, operator tuning, or something else?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Golden Metric #20

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

The Golden Metric #20

Uh oh!

kyegomez Jul 21, 2023 Maintainer

Replies: 2 comments

Uh oh!

elder-plinius Jul 21, 2023

Uh oh!

kinthaiofficial Apr 29, 2026

kyegomez
Jul 21, 2023
Maintainer

elder-plinius
Jul 21, 2023

kinthaiofficial
Apr 29, 2026