-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve error handling in htmlToText function #2309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enhance htmlToText function to handle panics and errors safely.
panic: html: open stack of elements exceeds 512 nodes
goroutine 5523922 [running]:
github.com/projectdiscovery/httpx/common/pagetypeclassifier.htmlToText(...)
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:36
github.com/projectdiscovery/httpx/common/pagetypeclassifier.(*PageTypeClassifier).Classify(0xc0005164d8, {0xc0ba03a000?, 0xd?})
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:26 +0x6f
github.com/projectdiscovery/httpx/runner.(*Runner).analyze(_, _, {_, _}, {{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, ...)
/home/runner/work/httpx/httpx/runner/runner.go:2349 +0x7555
github.com/projectdiscovery/httpx/runner.(*Runner).process.func1({{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, {0x1686161?, 0x10?}, {0x16ace2d, 0xa})
/home/runner/work/httpx/httpx/runner/runner.go:1444 +0x125
created by github.com/projectdiscovery/httpx/runner.(*Runner).process in goroutine 1
/home/runner/work/httpx/httpx/runner/runner.go:1442 +0x8a6
WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant htmlToText
participant Converter
participant Recovery
Note over htmlToText: Previous behavior
Caller->>htmlToText: call htmlToText(html)
htmlToText->>Converter: attempt conversion
alt conversion succeeds
Converter-->>htmlToText: return text
htmlToText-->>Caller: return text
else conversion fails
Converter->>htmlToText: panic
htmlToText->>Caller: propagate panic
end
Note over htmlToText: New behavior
Caller->>htmlToText: call htmlToText(html)
htmlToText->>Recovery: defer recover()
htmlToText->>Converter: attempt conversion
alt conversion succeeds
Converter-->>htmlToText: return text
htmlToText-->>Caller: return text
else conversion fails or panics
Converter->>Recovery: panic
Recovery->>htmlToText: catch panic
htmlToText->>htmlToText: log error (optional)
htmlToText-->>Caller: return "" (empty string)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
common/pagetypeclassifier/pagetypeclassifier.go (1)
35-40: Consider implementing the suggested logging.The panic recovery mechanism correctly prevents crashes. However, silently swallowing panics can make debugging difficult in production. Consider implementing the logging mentioned in the comment to improve observability.
If
utilsprovides a logger, apply this diff:defer func() { if r := recover(); r != nil { - // Optionally log this event, e.g., log.Printf("Recovered in htmlToText: %v", r) + // Log panic for debugging + // log.Printf("Recovered panic in htmlToText: %v", r) text = "" } }()Uncomment and adapt the logging line if a logger is available in your project context.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
common/pagetypeclassifier/pagetypeclassifier.go(1 hunks)
🔇 Additional comments (2)
common/pagetypeclassifier/pagetypeclassifier.go (2)
33-34: Good addition of panic protection.The named return value is the correct approach for the defer-recover pattern. The comment clearly explains the purpose of the function.
41-46: LGTM! Error handling is consistent and maintains backward compatibility.The error handling correctly returns an empty string for both conversion errors and panics, which is then handled appropriately by the
Classifymethod (line 27-28). The naked return is clean and appropriate with the named return value.
Mzack9999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
|
@coderabbitai create an issue for this PR: panics without self-recovery should not be present in sub-packages |
Enhance htmlToText function to handle panics and errors safely.
panic: html: open stack of elements exceeds 512 nodes
goroutine 5523922 [running]:
github.com/projectdiscovery/httpx/common/pagetypeclassifier.htmlToText(...)
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:36
github.com/projectdiscovery/httpx/common/pagetypeclassifier.(*PageTypeClassifier).Classify(0xc0005164d8, {0xc0ba03a000?, 0xd?})
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:26 +0x6f
github.com/projectdiscovery/httpx/runner.(*Runner).analyze(_, , {, _}, {{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, ...)
/home/runner/work/httpx/httpx/runner/runner.go:2349 +0x7555
github.com/projectdiscovery/httpx/runner.(*Runner).process.func1({{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, {0x1686161?, 0x10?}, {0x16ace2d, 0xa})
/home/runner/work/httpx/httpx/runner/runner.go:1444 +0x125
created by github.com/projectdiscovery/httpx/runner.(*Runner).process in goroutine 1
/home/runner/work/httpx/httpx/runner/runner.go:1442 +0x8a6
Summary by CodeRabbit