fix(lambda-filter): sandbox model-generated lambda in Smart Transform#13736
fix(lambda-filter): sandbox model-generated lambda in Smart Transform#13736Jiangrong-W wants to merge 2 commits into
Conversation
The Smart Transform component (LambdaFilterComponent) asked the LLM to
return a Python lambda and passed the model-emitted string straight to
`eval()` with full interpreter builtins. Validation only checked that the
text started with `lambda` and contained `:`, so a model that was steered
(e.g. via prompt injection in the data being transformed) into emitting
`lambda x: __import__("os").system(...)` would run arbitrary code inside
the Langflow server process on the authenticated build/run path.
Harden the evaluation instead of removing the feature:
- Parse the generated lambda with `ast.parse` and require it to be a
single `Lambda` expression.
- Walk the AST and reject imports, dunder attribute/name access (the
classic `().__class__...__subclasses__()` escape), and an explicit
denylist of dangerous builtin names (`__import__`, `eval`, `exec`,
`open`, `getattr`, `globals`, ...).
- Compile and evaluate with a restricted `__builtins__` namespace that
exposes only safe data-shaping helpers (`len`, `sorted`, comprehensions,
etc.) and no module globals.
Legitimate data-transformation lambdas (filtering, mapping, slicing,
attribute/method calls on the input) keep working unchanged; the existing
integration tests pass. Errors remain `ValueError` so existing handlers
and flow validation continue to catch them.
Adds regression tests pinning both the allowed and rejected lambda sets.
Signed-off-by: christop <825583681@qq.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughReplaces direct ChangesLambda Evaluation Sandbox
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 8 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/backend/tests/unit/components/llm_operations/test_lambda_filter.py (1)
124-137: 💤 Low valueConsider adding a format string edge case to test coverage.
Python's format string protocol can access attributes on arguments without triggering the AST dunder check (the dunder is inside a string constant). While this can't execute code (format can't call methods), it could leak class information:
'lambda x: "{0.__class__.__name__}".format(x)'Adding this to either
LEGITIMATE_LAMBDAS(if info disclosure is acceptable) orMALICIOUS_LAMBDAS(if you want to block it) would document the expected behavior and prevent future regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/tests/unit/components/llm_operations/test_lambda_filter.py` around lines 124 - 137, Add a format string edge case test to document expected behavior for attribute access via Python's format string protocol. Include the lambda expression that uses format string notation to access class information (e.g., "{0.__class__.__name__}".format(x)) to either the LEGITIMATE_LAMBDAS or MALICIOUS_LAMBDAS list depending on whether information disclosure is acceptable in your security policy. This test case prevents future regressions and clarifies how the lambda filter handles format string-based attribute access that bypasses direct dunder notation checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lfx/src/lfx/components/llm_operations/lambda_filter.py`:
- Around line 94-113: The _FORBIDDEN_LAMBDA_NAMES frozenset is missing
"breakpoint" as a forbidden builtin, which is a security concern since the
breakpoint builtin (Python 3.7+) can invoke an interactive debugger and allow
inspection of process state. Add "breakpoint" to the _FORBIDDEN_LAMBDA_NAMES
frozenset to maintain consistency with the explicit parse-time rejection pattern
and keep the security contract auditable.
---
Nitpick comments:
In `@src/backend/tests/unit/components/llm_operations/test_lambda_filter.py`:
- Around line 124-137: Add a format string edge case test to document expected
behavior for attribute access via Python's format string protocol. Include the
lambda expression that uses format string notation to access class information
(e.g., "{0.__class__.__name__}".format(x)) to either the LEGITIMATE_LAMBDAS or
MALICIOUS_LAMBDAS list depending on whether information disclosure is acceptable
in your security policy. This test case prevents future regressions and
clarifies how the lambda filter handles format string-based attribute access
that bypasses direct dunder notation checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8157796f-39f3-4225-88e0-23dac243bb3a
📒 Files selected for processing (2)
src/backend/tests/unit/components/llm_operations/test_lambda_filter.pysrc/lfx/src/lfx/components/llm_operations/lambda_filter.py
| # Builtin/global names that must never be referenced from a generated lambda. | ||
| # Even though they are already absent from ``_SAFE_LAMBDA_BUILTINS`` (so they | ||
| # would raise ``NameError`` at runtime), rejecting them at parse time gives a | ||
| # clear, fail-fast error and keeps the security contract explicit and auditable. | ||
| _FORBIDDEN_LAMBDA_NAMES: frozenset[str] = frozenset( | ||
| { | ||
| "__import__", | ||
| "compile", | ||
| "delattr", | ||
| "eval", | ||
| "exec", | ||
| "getattr", | ||
| "globals", | ||
| "input", | ||
| "locals", | ||
| "open", | ||
| "setattr", | ||
| "vars", | ||
| } | ||
| ) |
There was a problem hiding this comment.
Consider adding breakpoint to the forbidden names list.
The breakpoint builtin (Python 3.7+) can invoke an interactive debugger, potentially allowing an attacker to inspect state or interact with the process. While it's not in _SAFE_LAMBDA_BUILTINS (so it would raise NameError at runtime), adding it here maintains the pattern of explicit parse-time rejection and keeps the security contract auditable.
🛡️ Suggested addition
_FORBIDDEN_LAMBDA_NAMES: frozenset[str] = frozenset(
{
"__import__",
+ "breakpoint",
"compile",
"delattr",
"eval",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lfx/src/lfx/components/llm_operations/lambda_filter.py` around lines 94 -
113, The _FORBIDDEN_LAMBDA_NAMES frozenset is missing "breakpoint" as a
forbidden builtin, which is a security concern since the breakpoint builtin
(Python 3.7+) can invoke an interactive debugger and allow inspection of process
state. Add "breakpoint" to the _FORBIDDEN_LAMBDA_NAMES frozenset to maintain
consistency with the explicit parse-time rejection pattern and keep the security
contract auditable.
The Smart Transform component (LambdaFilterComponent) asked the LLM to return a Python lambda and passed the model-emitted string straight to
eval()with full interpreter builtins. Validation only checked that the text started withlambdaand contained:, so a model that was steered (e.g. via prompt injection in the data being transformed) into emittinglambda x: __import__("os").system(...)would run arbitrary code inside the Langflow server process on the authenticated build/run path.Harden the evaluation instead of removing the feature:
ast.parseand require it to be a singleLambdaexpression.().__class__...__subclasses__()escape), and an explicit denylist of dangerous builtin names (__import__,eval,exec,open,getattr,globals, ...).__builtins__namespace that exposes only safe data-shaping helpers (len,sorted, comprehensions, etc.) and no module globals.Legitimate data-transformation lambdas (filtering, mapping, slicing, attribute/method calls on the input) keep working unchanged; the existing integration tests pass. Errors remain
ValueErrorso existing handlers and flow validation continue to catch them.Adds regression tests pinning both the allowed and rejected lambda sets.
Summary by CodeRabbit