Skip to content

Conversation

@GDATTACKER-RESEARCHER
Copy link
Contributor

@GDATTACKER-RESEARCHER GDATTACKER-RESEARCHER commented Nov 17, 2025

Enhance htmlToText function to handle panics and errors safely.

panic: html: open stack of elements exceeds 512 nodes

goroutine 5523922 [running]:
github.com/projectdiscovery/httpx/common/pagetypeclassifier.htmlToText(...)
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:36
github.com/projectdiscovery/httpx/common/pagetypeclassifier.(*PageTypeClassifier).Classify(0xc0005164d8, {0xc0ba03a000?, 0xd?})
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:26 +0x6f
github.com/projectdiscovery/httpx/runner.(*Runner).analyze(_, , {, _}, {{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, ...)
/home/runner/work/httpx/httpx/runner/runner.go:2349 +0x7555
github.com/projectdiscovery/httpx/runner.(*Runner).process.func1({{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, {0x1686161?, 0x10?}, {0x16ace2d, 0xa})
/home/runner/work/httpx/httpx/runner/runner.go:1444 +0x125
created by github.com/projectdiscovery/httpx/runner.(*Runner).process in goroutine 1
/home/runner/work/httpx/httpx/runner/runner.go:1442 +0x8a6

Summary by CodeRabbit

  • Refactor
    • Improved error handling for HTML text conversion to prevent application crashes when conversion errors occur.

Enhance htmlToText function to handle panics and errors safely.

panic: html: open stack of elements exceeds 512 nodes

goroutine 5523922 [running]:
github.com/projectdiscovery/httpx/common/pagetypeclassifier.htmlToText(...)
	/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:36
github.com/projectdiscovery/httpx/common/pagetypeclassifier.(*PageTypeClassifier).Classify(0xc0005164d8, {0xc0ba03a000?, 0xd?})
	/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:26 +0x6f
github.com/projectdiscovery/httpx/runner.(*Runner).analyze(_, _, {_, _}, {{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, ...)
	/home/runner/work/httpx/httpx/runner/runner.go:2349 +0x7555
github.com/projectdiscovery/httpx/runner.(*Runner).process.func1({{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, {0x1686161?, 0x10?}, {0x16ace2d, 0xa})
	/home/runner/work/httpx/httpx/runner/runner.go:1444 +0x125
created by github.com/projectdiscovery/httpx/runner.(*Runner).process in goroutine 1
	/home/runner/work/httpx/httpx/runner/runner.go:1442 +0x8a6
@auto-assign auto-assign bot requested a review from Mzack9999 November 17, 2025 03:26
@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Walkthrough

The htmlToText function in pagetypeclassifier.go has been refactored to implement panic recovery. Previously, the function would panic on conversion errors. Now it uses a deferred recover block to catch panics, optionally log them, and return an empty string on failure, while preserving normal execution paths when no errors occur.

Changes

Cohort / File(s) Summary
Error handling refactor
common/pagetypeclassifier/pagetypeclassifier.go
Updated htmlToText function signature to use named return value (text string). Added deferred panic recovery that logs errors optionally and returns empty string on conversion failure or panic, replacing previous panic behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant htmlToText
    participant Converter
    participant Recovery

    Note over htmlToText: Previous behavior
    Caller->>htmlToText: call htmlToText(html)
    htmlToText->>Converter: attempt conversion
    alt conversion succeeds
        Converter-->>htmlToText: return text
        htmlToText-->>Caller: return text
    else conversion fails
        Converter->>htmlToText: panic
        htmlToText->>Caller: propagate panic
    end

    Note over htmlToText: New behavior
    Caller->>htmlToText: call htmlToText(html)
    htmlToText->>Recovery: defer recover()
    htmlToText->>Converter: attempt conversion
    alt conversion succeeds
        Converter-->>htmlToText: return text
        htmlToText-->>Caller: return text
    else conversion fails or panics
        Converter->>Recovery: panic
        Recovery->>htmlToText: catch panic
        htmlToText->>htmlToText: log error (optional)
        htmlToText-->>Caller: return "" (empty string)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Panic recovery implementation: Verify the defer-recover pattern is correctly placed and that all panic paths are appropriately caught
  • Fallback behavior appropriateness: Confirm that returning an empty string is the intended behavior for all error cases and that callers handle empty strings correctly
  • Logging strategy: Review whether optional logging is implemented correctly and whether log levels/messages are appropriate
  • Caller impact: Scan call sites to ensure they don't depend on panic propagation or expect non-empty strings without validation

Poem

🐰 Where panics once would make code flee,
A gentle recover now sets it free!
With empty strings and logs so kind,
Resilient paths the function will find.
No more crashing—just graceful grace,
Safety blooms in this refactored place! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: improving error handling in the htmlToText function, which aligns with the primary objective of handling panics and errors safely.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
common/pagetypeclassifier/pagetypeclassifier.go (1)

35-40: Consider implementing the suggested logging.

The panic recovery mechanism correctly prevents crashes. However, silently swallowing panics can make debugging difficult in production. Consider implementing the logging mentioned in the comment to improve observability.

If utils provides a logger, apply this diff:

 	defer func() {
 		if r := recover(); r != nil {
-			// Optionally log this event, e.g., log.Printf("Recovered in htmlToText: %v", r)
+			// Log panic for debugging
+			// log.Printf("Recovered panic in htmlToText: %v", r)
 			text = ""
 		}
 	}()

Uncomment and adapt the logging line if a logger is available in your project context.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d370e9e and 7c94f19.

📒 Files selected for processing (1)
  • common/pagetypeclassifier/pagetypeclassifier.go (1 hunks)
🔇 Additional comments (2)
common/pagetypeclassifier/pagetypeclassifier.go (2)

33-34: Good addition of panic protection.

The named return value is the correct approach for the defer-recover pattern. The comment clearly explains the purpose of the function.


41-46: LGTM! Error handling is consistent and maintains backward compatibility.

The error handling correctly returns an empty string for both conversion errors and panics, which is then handled appropriately by the Classify method (line 27-28). The naked return is clean and appropriate with the named return value.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@Mzack9999 Mzack9999 added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Nov 18, 2025
@Mzack9999
Copy link
Member

@coderabbitai create an issue for this PR: panics without self-recovery should not be present in sub-packages

@Mzack9999 Mzack9999 linked an issue Nov 18, 2025 that may be closed by this pull request
@Mzack9999 Mzack9999 merged commit c6948ed into projectdiscovery:dev Nov 18, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panics in page classifier

2 participants