Skip to content

Measure generate_time on iOS benchmark #8580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions .github/scripts/extract_benchmark_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,21 +229,22 @@ def extract_ios_metric(

elif method == "forward":
if metric_name == "Clock Monotonic Time, s":
benchmark_result["metric"] = (
"generate_time(ms)"
if "llama" in test_name
else "avg_inference_latency(ms)"
)
benchmark_result["metric"] = "avg_inference_latency(ms)"
benchmark_result["actualValue"] = metric_value * 1000

elif metric_name == "Memory Peak Physical, kB":
# NB: Showing the value in mB is friendlier IMO
benchmark_result["metric"] = "peak_inference_mem_usage(mb)"
benchmark_result["actualValue"] = metric_value / 1024

elif method == "generate" and metric_name == "Tokens Per Second, t/s":
benchmark_result["metric"] = "token_per_sec"
benchmark_result["actualValue"] = metric_value
elif method == "generate":
if metric_name == "Clock Monotonic Time, s":
benchmark_result["metric"] = "generate_time(ms)"
benchmark_result["actualValue"] = metric_value * 1000
Copy link
Contributor

@guangy10 guangy10 Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in code, TPS and generate_time are calculated independently. That's why I see some fields report regression only on TPS but not on generate_time. See the screenshot below. @shoumikhin Should we consolidate the measurement (use the one that is more reliable seems like generate_time?) and only report one metric instead I'd prefer TPS as it's more human readable.

Screenshot 2025-02-20 at 10 54 28 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is the bug is about, the current generate_time is actually avg_inference_latency from forward. After this change fixes this issue, we can consider keeping just TPS I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huydhn Do you want to remove/hide generate_time from the dashboard in a follow-up PR? The Android side needs to be corrected first I guess before it. cc: @kirklandsign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the PR for that is ready here pytorch/test-infra#6314


elif metric_name == "Tokens Per Second, t/s":
benchmark_result["metric"] = "token_per_sec"
benchmark_result["actualValue"] = metric_value
Comment on lines +241 to +247
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like they are measuring same thing, we won't need both if that's the case. Left a comment in #8576 (comment) for @shoumikhin to clarify.


return benchmark_result

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ @implementation LLaMATests
return;
}
TokensPerSecondMetric *tokensPerSecondMetric = [TokensPerSecondMetric new];
[testCase measureWithMetrics:@[ tokensPerSecondMetric, [XCTMemoryMetric new] ]
[testCase measureWithMetrics:@[ tokensPerSecondMetric, [XCTClockMetric new], [XCTMemoryMetric new] ]
block:^{
tokensPerSecondMetric.tokenCount = 0;
const auto status = runner->generate(
Expand Down
Loading