test harness for quick checks by akshaydeo · Pull Request #3186 · maximhq/bifrost

akshaydeo · 2026-05-01T21:31:54Z

Summary

Briefly explain the purpose of this PR and the problem it solves.

Changes

What was changed and why
Any notable design decisions or trade-offs

Type of change

Affected areas

How to test

Describe the steps to validate this change. Include commands and expected outcomes.

# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build

If adding new configs or environment variables, document them here.

Screenshots/Recordings

If UI changes, add before/after screenshots or short clips.

Breaking changes

Yes
No

If yes, describe impact and migration instructions.

Related issues

Link related issues and discussions. Example: Closes #123

Security considerations

Note any security implications (auth, secrets, PII, sandboxing, etc.).

Checklist

I read docs/contributing/README.md and followed the guidelines
I added/updated tests where appropriate
I updated documentation where needed
I verified builds succeed (Go and UI)
I verified the CI pipeline passes locally if applicable

CLAassistant · 2026-05-01T21:32:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-05-01T21:32:04Z

Warning

Rate limit exceeded

@akshaydeo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 52 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 845872fe-7539-42a3-afae-4b8d088ec891

📥 Commits

Reviewing files that changed from the base of the PR and between 6d08e1c and 5f06bd8.

📒 Files selected for processing (19)

Makefile
core/internal/llmtests/provider_feature_support_test.go
core/providers/anthropic/chat.go
core/providers/anthropic/chatservertools_test.go
core/providers/anthropic/payloadordering_test.go
core/providers/anthropic/requestbuilder.go
core/providers/anthropic/requestbuilder_test.go
core/providers/anthropic/utils.go
core/providers/anthropic/utils_test.go
core/providers/anthropic/validatechattools_test.go
core/providers/vertex/vertex.go
docs/docs.json
docs/providers/test-harness-coverage.mdx
tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md
tests/e2e/api/collections/provider-harness.json
tests/e2e/api/runners/analyze-failures.mjs
tests/e2e/api/runners/filter-collection.mjs
tests/e2e/api/runners/harness-viewer.mjs
tests/integrations/python/config.json

📝 Walkthrough

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely a template with all sections left blank or unchecked (Summary, Changes, Type of change, Affected areas, How to test, etc.), providing no actual implementation details about the changes made.	Complete the PR description with actual content: fill in Summary, explain the Changes made (test harness, Makefile macros, analyzer tools, docs), check Type of change boxes, mark Affected areas, provide test instructions, and address Security/Related issues sections.
Linked Issues check	⚠️ Warning	The linked issue (`#123`) requests Files API support for providers (upload files for RAG/fine-tuning); however, the PR changes focus on building a test harness infrastructure with Postman collections, Make targets, and analysis tools—not Files API implementation.	Either implement the Files API support requirements from issue `#123`, or remove/update the linked issue to match the actual test harness objectives being delivered in this PR.
Out of Scope Changes check	⚠️ Warning	Multiple changes appear out of scope: Anthropic tool normalization logic (utils.go, chat.go, requestbuilder.go) and Vertex provider modifications are not related to the test harness infrastructure objectives or the linked Files API issue.	Move Anthropic tool normalization changes (chat.go, utils.go, utils_test.go, requestbuilder.go, vertex.go) to a separate PR, keeping this PR focused solely on test harness tooling and documentation.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'test harness for quick checks' refers to the primary focus of the changeset (adding a comprehensive Postman-driven test harness with tooling for provider integration testing), making it directly related to the main changes.
Docstring Coverage	✅ Passed	Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 05-02-test_harness_for_quick_checks

_{Review rate limit: 0/5 reviews remaining, refill in 10 minutes and 52 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

akshaydeo · 2026-05-01T21:32:07Z

adds trace attribute flow #3219
anthropic computer use fixes across proivder #3195
gemini named content cache support #3194
test harness for quick checks #3186 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-05-01T21:32:30Z

🧪 Test Suite Available

This PR can be tested by a repository admin.

Run tests for PR #3186

greptile-apps · 2026-05-01T21:34:37Z

Confidence Score: 5/5

Safe to merge; findings are P2 only and do not affect correctness of the core normalisation feature.

All findings are P2: one is dead code that never causes a wrong result (both branches already return the same value), and one is a normalisation gap on a path that was also un-normalised before this PR. The new Go logic is correct, well-tested, and the Vertex path is properly wired up.

core/providers/anthropic/utils.go — the Anthropic-native raw-body Responses path does not enable RemapToolVersions; tests/e2e/api/runners/analyze-failures.mjs — dead code in the 404 handler.

Important Files Changed

Filename	Overview
core/providers/anthropic/utils.go	Adds ComputerUseGeneration, NormalizedToolSpec, computerUseBaseTool helpers and extends RemapRawToolVersionsForProvider to accept a model param for per-generation computer-use tool normalisation; normalisation is not enabled for the Anthropic-native raw-body Responses path.
core/providers/anthropic/chat.go	Extends convertServerToolToAnthropic to accept model string and normalise computer-use tool {type, name} pairs to the correct generation before sending to Anthropic.
core/providers/anthropic/requestbuilder.go	New file extracting the shared Anthropic-family request-body assembly pipeline into BuildAnthropicResponsesRequestBody; clean consolidation with config-driven feature flags.
core/providers/vertex/vertex.go	Passes request.Model to RemapRawToolVersionsForProvider in both ChatCompletion and ChatCompletionStream paths to enable generation-aware computer-use tool normalisation.
tests/e2e/api/runners/analyze-failures.mjs	New categorisation script for newman harness failures; contains dead code in the 404 branch where both if-taken and fallthrough paths return the same "model_not_found" value.
tests/e2e/api/runners/harness-viewer.mjs	New interactive HTML viewer for harness results; /api/resend now validates the target URL against an allowedTargets set derived from the newman report, addressing the previously noted SSRF concern.
Makefile	Adds EXPOSE_ENV macro for Infisical/dotenv secret loading, install-newman, and run-provider-harness-test targets; broad port-kill on VIEWER_PORT noted in prior review.
core/providers/anthropic/utils_test.go	Adds comprehensive tests for ComputerUseGeneration, NormalizedToolSpec, and RemapRawToolVersionsForProvider covering both upgrade and downgrade directions across multiple model variants.

_{Reviews (4): Last reviewed commit: "test harness for quick checks" | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

tests/e2e/api/collections/provider-harness.json (1)

5-6: ⚡ Quick win

Add Files API smoke requests to align harness coverage with stack objective (#123).

This harness validates many provider surfaces but not file-upload/list/retrieve/delete flows, so the Files API objective is still uncovered by the quick-check path.

As per coding guidelines, “always check the stack if there is one for the current PR... see all changes in the light of the whole stack of PRs.”

Also applies to: 248-250
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/api/collections/provider-harness.json` around lines 5 - 6, The
harness JSON (provider-harness.json) omits Files API flows; add smoke requests
exercising file upload, list, retrieve, and delete to the collection so the
quick-check path covers the Files API objective; specifically add entries that
POST to the Files upload endpoint, GET the files list, GET a file by id, and
DELETE a file by id (use the same auth/baseUrl variable pattern used by existing
requests like the chat/completions entries) and wire them into the existing run
sequence so they run as part of the provider-harness smoke-test.

Makefile (1)

1536-1622: ⚡ Quick win

Refactor run-provider-harness-test body into a script to reduce Makefile complexity.

The target is doing orchestration, process lifecycle, and viewer control in one recipe block. Given the current length (and existing lint warning), extracting to a dedicated shell script will be easier to maintain and review.

🧩 Refactor direction

 run-provider-harness-test: install-newman ## ...
-	`@mkdir` -p tmp
-	@$(EXPOSE_ENV); \
-	...long body...
-	exit $$NEWMAN_EXIT
+	`@mkdir` -p tmp
+	@$(EXPOSE_ENV); \
+	$$CMD_PREFIX bash tests/e2e/api/runners/run-provider-harness.sh \
+	  "$(or $(BASE_URL),http://localhost:8080)" \
+	  "$(or $(APP_DIR),tests/integrations/python)" \
+	  "$(or $(VIEWER_PORT),8090)" \
+	  "$(ENV_FILE)" \
+	  "$(FOLDER)" \
+	  "$${CI:-$(CI)}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 1536 - 1622, The run-provider-harness-test target body
is too large and should be moved into a dedicated executable shell script;
create a script (e.g., scripts/run-provider-harness-test.sh) that implements the
current orchestration including BASE_URL/APP_DIR/VIEWER_PORT defaults, the
cleanup() and preempt_viewer_port() logic, launching/stopping bifrost, running
newman with the same flags (including reporter exports to tmp/newman-report.*),
launching tests/e2e/api/runners/harness-viewer.mjs when not CI, and returning
newman exit code; then simplify the Makefile target run-provider-harness-test to
export any required env, call that script with forwarded
arguments/ENV_FILE/INFISICAL vars, and ensure the script is executable and
preserves the exact behaviors referenced by names in the diff (cleanup,
preempt_viewer_port, harness-viewer.mjs, tmp/bifrost-dev.pid,
tmp/harness-viewer.pid, tmp/newman-report.*).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/api/collections/provider-harness.json`:
- Around line 23-30: The test harness currently reuses openaiKey for Azure which
couples providers; add a dedicated azureKey entry and replace any uses of
{{openaiKey}} for Azure auth with {{azureKey}} (update the
azureDeployment/azureApiVersion bindings to reference the new azureKey), and
ensure any other references (including the occurrences noted around lines
179-180) and CI/test env variables are updated to provide the new azureKey
value.
- Around line 14-16: The test name "pm.test('Status code is 2xx', ...)" is
inconsistent with the assertion which allows any status < 400; update the
assertion in the pm.expect call to enforce a strict 2xx range (e.g., assert
pm.response.code is >= 200 and < 300) or alternatively rename the pm.test label
to reflect "< 400" if you intend to allow 3xx; modify the pm.test / pm.expect
block accordingly so the test name and condition match.

In `@tests/e2e/api/runners/harness-viewer.mjs`:
- Around line 287-295: The forwarded fetch call can hang because it has no
timeout; wrap the request with an AbortController (create controller, pass
controller.signal into fetch) and start a timeout (e.g., const timer =
setTimeout(() => controller.abort(), RESEND_TIMEOUT_MS)) before calling fetch;
after fetch completes clearTimeout(timer). Also catch abort/fetch errors and
return a proper timeout response (e.g., status 504 and elapsedMs) instead of
hanging. Update the fetch invocation that uses url, method, headerObj, body and
ensure the timer is cleared on success or failure so the request cannot hang
indefinitely.
- Around line 278-292: The /api/resend handler currently forwards any
client-supplied url/method; restrict it by validating the parsed url and method
against the preloaded harness items and allowed schemes before calling fetch. In
the POST branch that parses raw/JSON (the handler around req.method === "POST"
&& u.pathname === "/api/resend"), check that the parsed url's protocol is
http(s) and that the hostname/path (or a full URL string) exists in the
in-memory harness list (e.g., the collection used to render harness items) and
that the requested method is one of the allowed methods for that harness item;
if validation fails return a 400/403 without calling fetch. Apply the same
header reconstruction (headerObj) only after validation and keep the existing
GET/HEAD body logic.

---

Nitpick comments:
In `@Makefile`:
- Around line 1536-1622: The run-provider-harness-test target body is too large
and should be moved into a dedicated executable shell script; create a script
(e.g., scripts/run-provider-harness-test.sh) that implements the current
orchestration including BASE_URL/APP_DIR/VIEWER_PORT defaults, the cleanup() and
preempt_viewer_port() logic, launching/stopping bifrost, running newman with the
same flags (including reporter exports to tmp/newman-report.*), launching
tests/e2e/api/runners/harness-viewer.mjs when not CI, and returning newman exit
code; then simplify the Makefile target run-provider-harness-test to export any
required env, call that script with forwarded arguments/ENV_FILE/INFISICAL vars,
and ensure the script is executable and preserves the exact behaviors referenced
by names in the diff (cleanup, preempt_viewer_port, harness-viewer.mjs,
tmp/bifrost-dev.pid, tmp/harness-viewer.pid, tmp/newman-report.*).

In `@tests/e2e/api/collections/provider-harness.json`:
- Around line 5-6: The harness JSON (provider-harness.json) omits Files API
flows; add smoke requests exercising file upload, list, retrieve, and delete to
the collection so the quick-check path covers the Files API objective;
specifically add entries that POST to the Files upload endpoint, GET the files
list, GET a file by id, and DELETE a file by id (use the same auth/baseUrl
variable pattern used by existing requests like the chat/completions entries)
and wire them into the existing run sequence so they run as part of the
provider-harness smoke-test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c7fe6ac-d2e3-4273-a53e-12be743ffcbf

📥 Commits

Reviewing files that changed from the base of the PR and between 734f02d and d7bc1b8.

📒 Files selected for processing (3)

Makefile
tests/e2e/api/collections/provider-harness.json
tests/e2e/api/runners/harness-viewer.mjs

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Makefile (1)

386-392: ⚠️ Potential issue | 🟠 Major

Add trailing backslash to line 391.

Line 391 is missing a trailing backslash, causing line 392 to be treated as a separate shell command. When make run APP_DIR=... is called, the -app-dir argument won't be passed to the bifrost-http binary.

 	@./tmp/bifrost-http \
 		-host "$(HOST)" \
 		-port "$(PORT)" \
 		-log-style "$(LOG_STYLE)" \
 		-log-level "$(LOG_LEVEL)" \
-		$(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)")
+		$(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") \
 		$(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 386 - 392, The Makefile command invoking
./tmp/bifrost-http treats the $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))")
token as a separate shell command because the previous line (the $(if
$(PROMETHEUS_LABELS)... line) is missing a trailing backslash; fix by adding a
trailing backslash to the end of the $(if
$(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") line so the
-app-dir argument is passed as part of the same command to the bifrost-http
invocation.

🧹 Nitpick comments (2)

core/providers/anthropic/utils_test.go (2)

2190-2286: ⚡ Quick win

Add at least one non-Anthropic provider case to verify provider gating.

RemapRawToolVersionsForProvider is provider-aware, but this table only exercises schemas.Anthropic (Line 2285). A Vertex/Bedrock no-remap case would protect against accidental cross-provider rewriting.

Diff suggestion

 	cases := []struct {
 		name      string
+		provider  schemas.ModelProvider
 		model     string
 		inputBody string
 		expected  []expectedTool
 	}{
 		{
 			name:  "sonnet-4-6 with new-gen tools (no-op)",
+			provider: schemas.Anthropic,
 			model: "claude-sonnet-4-6",
 			inputBody: `{"model":"claude-sonnet-4-6","tools":[
 				{"type":"computer_20251124","name":"computer","display_width_px":1024,"display_height_px":768},
 				{"type":"text_editor_20250728","name":"str_replace_based_edit_tool"},
 				{"type":"bash_20250124","name":"bash"}
 			]}`,
@@
 		},
+		{
+			name:     "vertex provider does not remap anthropic computer-use versions",
+			provider: schemas.Vertex,
+			model:    "claude-sonnet-4-6",
+			inputBody: `{"model":"claude-sonnet-4-6","tools":[
+				{"type":"text_editor_20250124","name":"str_replace_editor"}
+			]}`,
+			expected: []expectedTool{
+				{"text_editor_20250124", "str_replace_editor"},
+			},
+		},
 	}
 	for _, tc := range cases {
 		t.Run(tc.name, func(t *testing.T) {
-			out, err := RemapRawToolVersionsForProvider([]byte(tc.inputBody), schemas.Anthropic, tc.model)
+			provider := tc.provider
+			if provider == "" {
+				provider = schemas.Anthropic
+			}
+			out, err := RemapRawToolVersionsForProvider([]byte(tc.inputBody), provider, tc.model)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/providers/anthropic/utils_test.go` around lines 2190 - 2286, The test
table only uses schemas.Anthropic so it doesn't verify provider gating; add at
least one case that calls RemapRawToolVersionsForProvider with a non-Anthropic
provider (e.g., schemas.Vertex or schemas.Bedrock) and a model string for that
provider and an inputBody containing a tools array, and assert the output is
unchanged (expected nil or the same tool types/names); place the new case inside
the existing cases slice alongside the other entries so the loop will exercise
provider-aware behavior and prevent accidental cross-provider remapping.

2289-2293: ⚡ Quick win

Tighten the “no tools array” assertion to catch silent shape regressions.

On Line 2291, the check currently passes if tools is present but empty (or present with wrong shape). For a true no-op expectation, assert absence of tools explicitly.

Diff suggestion

-			if tc.expected == nil {
-				if toolsResult.Exists() && toolsResult.IsArray() && len(toolsResult.Array()) > 0 {
-					t.Fatalf("expected no tools array, got %s", toolsResult.Raw)
-				}
-				return
-			}
+			if tc.expected == nil {
+				if toolsResult.Exists() {
+					t.Fatalf("expected no tools field for no-op case, got %s", toolsResult.Raw)
+				}
+				return
+			}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/providers/anthropic/utils_test.go` around lines 2289 - 2293, The test
currently allows a present-but-empty or wrongly-shaped "tools" field to pass;
change the assertion in the providerUtils.GetJSONField(out, "tools") branch so
it explicitly fails if toolsResult.Exists() (i.e., require the field to be
absent when tc.expected == nil). Update the check around toolsResult (used with
providerUtils.GetJSONField, toolsResult.Exists(), toolsResult.IsArray(),
toolsResult.Array()) to call t.Fatalf when toolsResult.Exists() is true,
ensuring the test catches any silent presence or shape regressions for the
"tools" field.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/providers/anthropic/utils.go`:
- Line 1172: The test call to RemapRawToolVersionsForProvider now mismatches the
function signature: update the call in provider_feature_support_test (where it
currently passes []byte(tt.inputJSON), tt.provider) to include the model
argument as the third parameter, i.e. pass tt.model so the call becomes
RemapRawToolVersionsForProvider([]byte(tt.inputJSON), tt.provider, tt.model);
this ensures the test matches the function signature of
RemapRawToolVersionsForProvider.

In `@Makefile`:
- Around line 48-55: The Makefile block guarded by USE_INFISICAL_RESOLVED
currently sources the output of "infisical export" via process substitution
which can silently succeed with an empty environment if the export fails; change
it to run "infisical export --path \"$${INFISICAL_PATH_VAL}\" --format dotenv"
first, capture its exit status (or write its output to a temporary file), and if
the command fails print a clear error and exit 1; only when the export succeeds,
run "set -a; . <( ... )" or "set -a; . /tmp/<tempfile>" to source the variables
and then set +a and clean up the temp file; reference USE_INFISICAL_RESOLVED and
INFISICAL_PATH_VAL to locate the block to modify and ensure downstream targets
that rely on EXPOSE_ENV fail fast on export errors.

In `@tests/e2e/api/runners/analyze-failures.mjs`:
- Around line 555-570: renderMissingPerModel currently treats any feature with
cells[f].total === 0 as "missing" even when that feature is not applicable to
the model/provider; update the computation of tested and missing so they first
filter COVERAGE_FEATURES by applicability to the model (e.g., use an existing
feature-to-provider map or a per-cell flag like cells[f].applicable, or add a
helper isFeatureApplicableToModel(feature, key)), then compute tested =
applicableFeatures.filter(f => cells[f].total > 0) and missing =
applicableFeatures.filter(f => cells[f].total === 0); keep the rest of the
message logic but base counts and the shown/rest lists on those filtered arrays
so provider-inapplicable features are excluded from per-model gap counts.

In `@tests/integrations/python/config.json`:
- Around line 243-250: The config enables Bedrock batch usage
("use_for_batch_api": true) but the expected role ARN is missing; fix by either
setting "use_for_batch_api" to false for this provider or reintroducing the role
ARN fields the tests and loader expect (add a role_arn and/or batch_role_arn
entry under bedrock_key_config with the appropriate env reference so tests like
test_bedrock.py and utilities that read config["role_arn"] can find the batch
role).

---

Outside diff comments:
In `@Makefile`:
- Around line 386-392: The Makefile command invoking ./tmp/bifrost-http treats
the $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))") token as a separate shell
command because the previous line (the $(if $(PROMETHEUS_LABELS)... line) is
missing a trailing backslash; fix by adding a trailing backslash to the end of
the $(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") line so
the -app-dir argument is passed as part of the same command to the bifrost-http
invocation.

---

Nitpick comments:
In `@core/providers/anthropic/utils_test.go`:
- Around line 2190-2286: The test table only uses schemas.Anthropic so it
doesn't verify provider gating; add at least one case that calls
RemapRawToolVersionsForProvider with a non-Anthropic provider (e.g.,
schemas.Vertex or schemas.Bedrock) and a model string for that provider and an
inputBody containing a tools array, and assert the output is unchanged (expected
nil or the same tool types/names); place the new case inside the existing cases
slice alongside the other entries so the loop will exercise provider-aware
behavior and prevent accidental cross-provider remapping.
- Around line 2289-2293: The test currently allows a present-but-empty or
wrongly-shaped "tools" field to pass; change the assertion in the
providerUtils.GetJSONField(out, "tools") branch so it explicitly fails if
toolsResult.Exists() (i.e., require the field to be absent when tc.expected ==
nil). Update the check around toolsResult (used with providerUtils.GetJSONField,
toolsResult.Exists(), toolsResult.IsArray(), toolsResult.Array()) to call
t.Fatalf when toolsResult.Exists() is true, ensuring the test catches any silent
presence or shape regressions for the "tools" field.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 46c03a79-f207-432a-b6d9-a87cdd1c4a6f

📥 Commits

Reviewing files that changed from the base of the PR and between d7bc1b8 and 6d4e381.

📒 Files selected for processing (18)

Makefile
core/providers/anthropic/chat.go
core/providers/anthropic/chatservertools_test.go
core/providers/anthropic/payloadordering_test.go
core/providers/anthropic/requestbuilder.go
core/providers/anthropic/requestbuilder_test.go
core/providers/anthropic/utils.go
core/providers/anthropic/utils_test.go
core/providers/anthropic/validatechattools_test.go
core/providers/vertex/vertex.go
docs/docs.json
docs/providers/test-harness-coverage.mdx
tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md
tests/e2e/api/collections/provider-harness.json
tests/e2e/api/runners/analyze-failures.mjs
tests/e2e/api/runners/filter-collection.mjs
tests/e2e/api/runners/harness-viewer.mjs
tests/integrations/python/config.json

✅ Files skipped from review due to trivial changes (1)

docs/providers/test-harness-coverage.mdx

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Makefile`:
- Around line 1532-1534: The Makefile currently computes BASE_URL_VAL but does
not propagate the resolved host/port into the auto-start path, causing `make
dev` to start on the default HOST/PORT (8080) while callers may have set
BASE_URL to another host/port; update the logic around BASE_URL_VAL and the
auto-start invocation of `make dev` so you parse BASE_URL_VAL to extract host
and port (fallback to localhost:8080 when absent) and export those as HOST and
PORT (or pass HOST=$(HOST_VAL) PORT=$(PORT_VAL)) into the `make dev` command;
additionally, if BASE_URL_VAL is a non-local/remote URL, skip the auto-start and
fail fast with a clear message; apply the same change to the repeated block that
computes VIEWER_PORT_VAL (the other occurrence mentioned).
- Line 397: The Makefile currently breaks the ./tmp/bifrost-http command because
the previous line does not end with a line-continuation, so the $(if
$(APP_DIR),-app-dir "$(abspath $(APP_DIR))") argument is treated as a separate
shell command; fix it by adding a trailing backslash to the end of the preceding
Makefile command so that the APP_DIR conditional (the expression referencing
APP_DIR and abspath) is passed as an argument to ./tmp/bifrost-http.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7569032-8cc1-4485-9bd8-09d660bbd6f9

📥 Commits

Reviewing files that changed from the base of the PR and between 6d4e381 and 6d08e1c.

📒 Files selected for processing (19)

Makefile
core/internal/llmtests/provider_feature_support_test.go
core/providers/anthropic/chat.go
core/providers/anthropic/chatservertools_test.go
core/providers/anthropic/payloadordering_test.go
core/providers/anthropic/requestbuilder.go
core/providers/anthropic/requestbuilder_test.go
core/providers/anthropic/utils.go
core/providers/anthropic/utils_test.go
core/providers/anthropic/validatechattools_test.go
core/providers/vertex/vertex.go
docs/docs.json
docs/providers/test-harness-coverage.mdx
tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md
tests/e2e/api/collections/provider-harness.json
tests/e2e/api/runners/analyze-failures.mjs
tests/e2e/api/runners/filter-collection.mjs
tests/e2e/api/runners/harness-viewer.mjs
tests/integrations/python/config.json

✅ Files skipped from review due to trivial changes (4)

core/providers/vertex/vertex.go
tests/e2e/api/runners/filter-collection.mjs
docs/providers/test-harness-coverage.mdx
core/providers/anthropic/utils_test.go

🚧 Files skipped from review as they are similar to previous changes (6)

core/providers/anthropic/chat.go
core/providers/anthropic/utils.go
tests/e2e/api/runners/harness-viewer.mjs
tests/e2e/api/runners/analyze-failures.mjs
docs/docs.json
tests/integrations/python/config.json

akshaydeo · 2026-05-05T06:47:41Z

Merge activity

May 5, 6:47 AM UTC: A user started a stack merge that includes this pull request via Graphite.
May 5, 6:48 AM UTC: @akshaydeo merged this pull request with Graphite.

akshaydeo marked this pull request as ready for review May 1, 2026 21:32

akshaydeo requested a review from a team as a code owner May 1, 2026 21:32

greptile-apps Bot reviewed May 1, 2026

View reviewed changes

Comment thread tests/e2e/api/collections/provider-harness.json

coderabbitai Bot requested changes May 1, 2026

View reviewed changes

Comment thread tests/e2e/api/collections/provider-harness.json Outdated

Comment thread tests/e2e/api/collections/provider-harness.json

Comment thread tests/e2e/api/runners/harness-viewer.mjs Outdated

Comment thread tests/e2e/api/runners/harness-viewer.mjs Outdated

akshaydeo force-pushed the 05-02-test_harness_for_quick_checks branch from d7bc1b8 to 6d4e381 Compare May 2, 2026 19:22

coderabbitai Bot requested changes May 2, 2026

View reviewed changes

Comment thread core/providers/anthropic/utils.go

Comment thread Makefile Outdated

Comment thread tests/e2e/api/runners/analyze-failures.mjs

Comment thread tests/integrations/python/config.json

This was referenced May 3, 2026

gemini named content cache support #3194

Merged

anthropic computer use fixes across proivder #3195

Merged

akshaydeo force-pushed the 05-02-test_harness_for_quick_checks branch from 6d4e381 to 6d08e1c Compare May 4, 2026 05:24

coderabbitai Bot requested review from Pratham-Mishra04, danpiths and roroghost17 May 4, 2026 05:25

coderabbitai Bot requested changes May 4, 2026

View reviewed changes

Comment thread Makefile

Comment thread Makefile

test harness for quick checks

5f06bd8

akshaydeo force-pushed the 05-02-test_harness_for_quick_checks branch from 6d08e1c to 5f06bd8 Compare May 4, 2026 15:25

coderabbitai Bot approved these changes May 4, 2026

View reviewed changes

akshaydeo mentioned this pull request May 5, 2026

adds trace attribute flow #3219

Merged

18 tasks

akshaydeo merged commit 68bf0c7 into main May 5, 2026
13 of 16 checks passed

akshaydeo deleted the 05-02-test_harness_for_quick_checks branch May 5, 2026 06:48

Conversation

akshaydeo commented May 1, 2026

Summary

Changes

Type of change

Affected areas

How to test

Screenshots/Recordings

Breaking changes

Related issues

Security considerations

Checklist

Uh oh!

CLAassistant commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

❌ Failed checks (3 warnings)

Uh oh!

akshaydeo commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

🧪 Test Suite Available

Uh oh!

greptile-apps Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

akshaydeo commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 1, 2026 •

edited

Loading

akshaydeo commented May 1, 2026 •

edited

Loading

greptile-apps Bot commented May 1, 2026 •

edited

Loading

akshaydeo commented May 5, 2026 •

edited

Loading