-
Notifications
You must be signed in to change notification settings - Fork 87
Refactor eval type choice logic #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors evaluation mode handling: CLI only normalizes deprecated "ofborg" to "github" and passes a string Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI pr_command
participant Review as Review
participant GH as GithubClient
participant Local as Local Eval
User->>CLI: run pr (with --eval)
CLI->>Review: Review(eval_type=args.eval, api_token, ...)
Note over Review: compute _use_github_eval from eval_type, token, only_packages, config
alt _use_github_eval = true
Review->>GH: use token / trigger GitHub Actions
GH-->>Review: evaluation results
else _use_github_eval = false
Review->>Local: run local evaluation
Local-->>Review: evaluation results
end
Review-->>CLI: build_pr outcome
CLI-->>User: report results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
nixpkgs_review/review.py (1)
168-215: Side-effectful property; improve config normalization and messaging
- Properties that warn/exit are surprising; prefer a method (e.g.,
decide_eval_mode()) for side effects.- Treat empty/whitespace-only
--extra-nixpkgs-configas default to avoid unnecessary local fallback.- When
--eval githuband no token, print a clear tip before exiting.Patch 1 — clearer error when token missing:
if not self.api_token: warn("No GitHub token provided via GITHUB_TOKEN variable.") - match self.eval_type: - case "auto": - warn( - "Falling back to local evaluation.\n" - "Tip: Install the `gh` command line tool and run `gh auth login` to authenticate." - ) - return False - case "github": - sys.exit(1) + match self.eval_type: + case "auto": + warn( + "Falling back to local evaluation.\n" + "Tip: run `gh auth login` or pass --token." + ) + return False + case "github": + warn("Cannot use `--eval github` without a GitHub token. Tip: run `gh auth login` or pass --token.") + sys.exit(1)Patch 2 — robust default-config detection:
- normalized_config = self.extra_nixpkgs_config.replace(" ", "") + normalized_config = "".join(self.extra_nixpkgs_config.split()) - if normalized_config == "{}": + if normalized_config in ("", "{}"): return TrueOptional follow-up: rename
_use_github_evalto a method and call it; keep behavior identical but avoid side effects in a property.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
nixpkgs_review/cli/pr.py(2 hunks)nixpkgs_review/review.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nixpkgs_review/cli/pr.py (3)
tests/test_pr.py (3)
test_pr_ofborg_eval(226-282)test_pr_github_action_eval(286-367)test_pr_only_packages_does_not_trigger_an_eval(372-415)nixpkgs_review/github.py (2)
get_github_action_eval_result(172-217)GithubClient(38-217)nixpkgs_review/cli/__init__.py (1)
pr_flags(45-101)
nixpkgs_review/review.py (3)
nixpkgs_review/github.py (2)
GithubClient(38-217)get_github_action_eval_result(172-217)tests/test_pr.py (2)
test_pr_ofborg_eval(226-282)test_pr_github_action_eval(286-367)nixpkgs_review/cli/__init__.py (1)
pr_flags(45-101)
🔇 Additional comments (4)
nixpkgs_review/cli/pr.py (2)
43-46: Normalize legacy --eval=ofborg to github — LGTMMapping “ofborg” to “github” with a deprecation warning is fine.
104-104: Pass eval_type through to Review — LGTMConstructor change is correctly reflected here.
nixpkgs_review/review.py (2)
138-141: GitHub client/token wiring — LGTMStoring
api_tokenand constructingGithubClienthere aligns with centralizing eval logic.
474-501: Correct branching on GitHub eval with retry loop — LGTMUses central decision, waits with a sane timeout, and skips local eval when results are fetched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nixpkgs_review/review.py (1)
474-501: Set GITHUB_TOKEN in test_pr_github_action_eval
tests/test_pr.py:286’stest_pr_github_action_evalmust inject aGITHUB_TOKEN(e.g.os.environ["GITHUB_TOKEN"] = "foo"or viamonkeypatch.setenv) so it continues to exercise remote GitHub eval under the new auth-required policy.
♻️ Duplicate comments (1)
nixpkgs_review/review.py (1)
108-109: Defaulteval_typeto "auto" to avoid breaking existing call sitesThis was flagged earlier; making it optional preserves backwards-compat and avoids TypeError in unupdated callers.
- eval_type: str, + eval_type: str = "auto",
🧹 Nitpick comments (4)
nixpkgs_review/review.py (4)
138-141: Normalize and validateeval_typeearlyLowercase and validate against the allowed set to fail fast with a clear error.
self.github_client = GithubClient(api_token) - self.eval_type = eval_type + self.eval_type = (eval_type or "auto").lower() + if self.eval_type not in {"auto", "local", "github"}: + raise NixpkgsReviewError(f"Invalid eval_type: {self.eval_type!r}")
177-189: Clarify token hint; keep ‘auto’ fallback, but improve UX messageMention both common env vars and a helpful command.
- warn("No GitHub token provided via GITHUB_TOKEN variable.") + warn("No GitHub token configured (set GITHUB_TOKEN or GH_TOKEN, or run `gh auth login`).")
188-189: Prefer raisingNixpkgsReviewErroroversys.exitinside library codeThis keeps control at the caller, eases testing, and matches existing error handling patterns.
- sys.exit(1) + raise NixpkgsReviewError("`--eval github` requires authentication (GITHUB_TOKEN/GH_TOKEN).")- warn("Invalid eval_type") - sys.exit(1) + raise NixpkgsReviewError(f"Invalid eval_type: {self.eval_type!r}")Also applies to: 211-214
480-491: Use gentle backoff when polling for artifacts and avoid hard exitImproves robustness and reduces API pressure; also raise instead of exit on timeout.
- timeout_s: int = 10 - print(f"...Results are not (yet) available. Retrying in {timeout_s}s") - waiting_time_s: int = 0 - while packages_per_system is None: - waiting_time_s += timeout_s - print(".", end="") - sys.stdout.flush() - time.sleep(timeout_s) - packages_per_system = ( - self.github_client.get_github_action_eval_result(pr) - ) - if waiting_time_s > 10 * 60: - warn( - "\nTimeout exceeded: No evaluation seems to be available on GitHub." - "\nLook for an eventual evaluation error issue on the PR web page." - "\nAlternatively, use `--eval local` to do the evaluation locally." - ) - sys.exit(1) + print("...Results are not (yet) available. Retrying with backoff up to 600s") + sleep_s = 5 + waited = 0 + while packages_per_system is None and waited < 600: + time.sleep(sleep_s) + waited += sleep_s + print(".", end="") + sys.stdout.flush() + packages_per_system = self.github_client.get_github_action_eval_result(pr) + sleep_s = min(sleep_s * 2, 60) + if packages_per_system is None: + raise NixpkgsReviewError( + "Timeout exceeded: No evaluation available on GitHub. " + "Check the PR’s workflow logs or use `--eval local`." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
nixpkgs_review/cli/pr.py(2 hunks)nixpkgs_review/review.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- nixpkgs_review/cli/pr.py
🧰 Additional context used
🧬 Code graph analysis (1)
nixpkgs_review/review.py (2)
nixpkgs_review/github.py (2)
GithubClient(38-217)get_github_action_eval_result(172-217)tests/test_pr.py (2)
test_pr_ofborg_eval(226-282)test_pr_github_action_eval(286-367)
🔇 Additional comments (1)
nixpkgs_review/review.py (1)
954-956: LGTM: forcing local eval for local revision reviewExplicitly setting
eval_type="local"here is correct and avoids accidental remote fetch.
| if self._use_github_eval: | ||
| assert all(system in PLATFORMS for system in self.systems) | ||
| print("-> Fetching eval results from GitHub actions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace assert with user-facing error for unsupported systems
Asserts can be stripped with -O and aren’t user-friendly.
- assert all(system in PLATFORMS for system in self.systems)
+ if not all(system in PLATFORMS for system in self.systems):
+ raise NixpkgsReviewError(
+ f"GitHub eval only supports {sorted(PLATFORMS)}; got {sorted(self.systems)}"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self._use_github_eval: | |
| assert all(system in PLATFORMS for system in self.systems) | |
| print("-> Fetching eval results from GitHub actions") | |
| if self._use_github_eval: | |
| if not all(system in PLATFORMS for system in self.systems): | |
| raise NixpkgsReviewError( | |
| f"GitHub eval only supports {sorted(PLATFORMS)}; got {sorted(self.systems)}" | |
| ) | |
| print("-> Fetching eval results from GitHub actions") |
🤖 Prompt for AI Agents
In nixpkgs_review/review.py around lines 474 to 476, replace the bare assert
that checks all requested systems are in PLATFORMS with a user-facing error:
perform the same membership check with an if statement and when it fails either
raise a ValueError (or print an explanatory message and exit non‑zero) that
lists the unsupported systems and the allowed PLATFORMS, so users see a clear,
non-stripable error instead of an assert that may be removed by -O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserts are indeed an issue. I would like to enable the ruff lint that ban those from non-test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if they have ruff configuration files
echo "=== Checking for existing ruff configuration ==="
fd -t f "pyproject.toml|ruff.toml|.ruff.toml" -x cat {}
echo -e "\n=== Checking for existing assert statements in non-test code ==="
# Find assert statements, excluding test directories
rg -n "^\s*assert\b" --type py -g '!**/test*/**' -g '!**/tests/**' -g '!test_*' -g '!*_test.py'
Length of output: 2964
🏁 Script executed:
#!/bin/bash
# Check their current ruff configuration
echo "=== Checking for existing ruff configuration ==="
if [ -f "pyproject.toml" ]; then
echo "Found pyproject.toml:"
cat pyproject.toml
elif [ -f "ruff.toml" ]; then
echo "Found ruff.toml:"
cat ruff.toml
elif [ -f ".ruff.toml" ]; then
echo "Found .ruff.toml:"
cat .ruff.toml
else
echo "No ruff configuration file found."
fi
echo -e "\n=== Searching for assert statements in non-test code ==="
# Find assert statements, excluding test directories and common test file patterns
rg -n --no-heading "^\s*assert\b" --type py \
-g '!**/test*/**' \
-g '!**/tests/**' \
-g '!test_*' \
-g '!*_test.py' \
-g '!test*.py' \
-g '!conftest.py' \
| head -20
echo -e "\n=== Count of assert statements in non-test code ==="
rg -c "^\s*assert\b" --type py \
-g '!**/test*/**' \
-g '!**/tests/**' \
-g '!test_*' \
-g '!*_test.py' \
-g '!test*.py' \
-g '!conftest.py' \
| wc -l
Length of output: 3182
@Mic92, great idea! Ruff has the perfect rule for this: S101 (from the flake8-bandit linter), which detects uses of the assert keyword. This rule flags assert statements because they're removed when Python runs with the -O optimization flag, making them unreliable for production code validation.
Based on the codebase analysis, I can see there are several assert statements in your non-test code that this rule would catch, including the one we just discussed in nixpkgs_review/review.py.
Here's how to configure it in your project:
Add to your pyproject.toml:
[tool.ruff.lint]
extend-select = ["S101"] # Enable assert detection
[tool.ruff.lint.per-file-ignores]
# Allow asserts in test files where they belong
"**/test*/**" = ["S101"]
"**/tests/**" = ["S101"]
"test_*.py" = ["S101"]
"*_test.py" = ["S101"]
"conftest.py" = ["S101"]Alternative approach - Enable broader security rules:
[tool.ruff.lint]
extend-select = ["S"] # Enable all flake8-bandit security rules (includes S101)
[tool.ruff.lint.per-file-ignores]
# Exclude test files from security rules where asserts are legitimate
"**/test*/**" = ["S101"]
"**/tests/**" = ["S101"]
"test_*.py" = ["S101"]
"*_test.py" = ["S101"]
"conftest.py" = ["S101"]This will help prevent future problematic assert usage in production code while still allowing legitimate assertions in your test suite. The rule will catch cases like the one we just discussed where assert statements can be optimized away, potentially breaking runtime behavior.
You'll want to address the existing assert statements in your codebase first, then enable the rule to prevent new ones from being introduced.
/| /|
( :v: )
|(_)(_)|
\_) (_/
thinking...
🏁 Script executed:
#!/bin/bash
# Check if they have ruff configuration files
echo "=== Checking for existing ruff configuration ==="
fd -t f "pyproject.toml|ruff.toml|.ruff.toml" -x echo "Found: {}" -x cat {}
echo -e "\n=== Searching for assert statements in non-test code ==="
# Find assert statements, excluding test directories and common test file patterns
rg -n --no-heading "^\s*assert\b" --type py \
-g '!**/test*/**' \
-g '!**/tests/**' \
-g '!test_*' \
-g '!*_test.py' \
-g '!test*.py' \
-g '!conftest.py' \
| head -20
echo -e "\n=== Count of assert statements in non-test code ==="
rg "^\s*assert\b" --type py \
-g '!**/test*/**' \
-g '!**/tests/**' \
-g '!test_*' \
-g '!*_test.py' \
-g '!test*.py' \
-g '!conftest.py' \
| wc -l
echo -e "\n=== Check if ruff is already being used ==="
if [ -f "pyproject.toml" ]; then
echo "Checking for ruff in pyproject.toml:"
grep -n -A5 -B2 "ruff" pyproject.toml || echo "No ruff configuration found in pyproject.toml"
fi
Length of output: 980
Excellent! I can see you already have ruff configured and are aware of the S101 rule. Looking at your current pyproject.toml, you have:
[tool.ruff]
target-version = "py312"
line-length = 88
lint.select = ["ALL"]
lint.ignore = [
# ... other rules ...
# Use of `assert` detected
"S101",
# ... other rules ...
]You're currently globally ignoring S101, but based on your comment, you want to selectively ban asserts from non-test code while still allowing them in tests.
Here's how to modify your configuration:
Modify your pyproject.toml to remove S101 from the global ignore and add per-file ignores:
lint.ignore = [
# ... other rules ...
- # Use of `assert` detected
- "S101",
# ... other rules ...
]
+[tool.ruff.lint.per-file-ignores]
+# Allow asserts in test files where they belong
+"**/test*/**" = ["S101"]
+"**/tests/**" = ["S101"]
+"test_*.py" = ["S101"]
+"*_test.py" = ["S101"]
+"conftest.py" = ["S101"]This change will:
- Enable S101 for production code - catching problematic assert usage like the one we just discussed
- Keep S101 disabled for test files - where assertions are legitimate and expected
- Force you to replace production asserts with proper error handling using
raisestatements
This approach gives you the best of both worlds: strict enforcement in production code while maintaining test flexibility.
/| /|
( :>: )
|(_)(_)|
\_) (_/
eureka!
🏁 Script executed:
#!/bin/bash
# Let's search for assert statements with a broader approach
echo "=== Searching for assert statements (broader search) ==="
rg -n "assert\s" --type py | head -20
echo -e "\n=== Searching specifically in the review.py file we're discussing ==="
rg -n "assert" nixpkgs_review/review.py
echo -e "\n=== Count all assert statements in Python files ==="
rg "assert\s" --type py | wc -l
Length of output: 1742
Perfect! Now I can see the full picture. You have 18 assert statements total, with 4 in production code that would need to be addressed:
Production code asserts that need fixing:
nixpkgs_review/review.py:475- The one we're discussing (system validation)nixpkgs_review/review.py:642-assert attrs is not Nonenixpkgs_review/review.py:755-assert attr.path is not Nonenixpkgs_review/review.py:808-assert isinstance(changed_packages, set)nixpkgs_review/github.py:210-assert isinstance(path, dict)nixpkgs_review/cli/pr.py:128-assert review is not Nonenixpkgs_review/cli/pr.py:148-assert builddir is not None
The rest are appropriately in test files.
To implement your S101 rule properly:
- Remove S101 from your global ignore list in
pyproject.toml:
lint.ignore = [
# ... other rules ...
- # Use of `assert` detected
- "S101",
# ... other rules ...
]- Add per-file ignores for test files:
[tool.ruff.lint.per-file-ignores]
# Allow asserts in test files where they belong
"tests/**" = ["S101"]
"conftest.py" = ["S101"]- Address the 7 production asserts by replacing them with proper error handling using
NixpkgsReviewErroror appropriate exceptions.
This will enforce the rule going forward while requiring you to clean up the existing problematic assertions. The test assertions (14 out of 18) will remain untouched since they're legitimate.
Would you like me to help create an issue to track the cleanup of these existing assert statements, or would you prefer to handle them as part of this PR?
/| /|
( :v: )
|(_)(_)|
\_) (_/
ready to help!
✏️ Learnings added
Learnt from: Mic92
PR: Mic92/nixpkgs-review#559
File: nixpkgs_review/review.py:474-476
Timestamp: 2025-09-21T08:58:33.690Z
Learning: The user Mic92 prefers to use ruff linting rules to enforce code quality standards, specifically wanting to ban assert statements from non-test code using the S101 rule.
Learnt from: Mic92
PR: Mic92/nixpkgs-review#559
File: nixpkgs_review/review.py:474-476
Timestamp: 2025-09-21T08:58:33.691Z
Learning: The nixpkgs-review project already has ruff configured with "ALL" rules selected but currently globally ignores S101 (assert detection). The user Mic92 wants to change this to selectively allow asserts only in test files while banning them from production code.
Learnt from: Mic92
PR: Mic92/nixpkgs-review#559
File: nixpkgs_review/review.py:474-476
Timestamp: 2025-09-21T08:58:33.691Z
Learning: The nixpkgs-review codebase currently has 18 assert statements total: 7 in production code that need to be replaced with proper error handling, and 11 in test files that should remain. The production asserts are in review.py (4), github.py (1), and cli/pr.py (2).
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
I was able to test is successfully on my machine. |
This PR addresses #556 (comment).
It introduces the following new behavior (the rest should remain the same as before):
In #556, I forced local evaluation in this case, but some users wanted to be able to use the GitHub eval results even though they might be wrong.
To implement this, I centralized all the
--evalhandling logic inreview.py.Feel free to suggest another way.
Summary by CodeRabbit
New Features
Refactor