Skip to content

Conversation

@Serizao
Copy link
Contributor

@Serizao Serizao commented Jul 23, 2025

Add logging of side request call

Summary by CodeRabbit

  • New Features

    • Capture detailed network activity during page screenshots and include the collected requests, statuses, and simplified error categories in analysis output.
    • Automatically accept in-page JavaScript dialogs and apply provided extra headers during page loads.
  • Bug Fixes

    • Improved reporting and inclusion of network data when navigation, loading, or screenshot operations fail.

@coderabbitai
Copy link

coderabbitai bot commented Jul 23, 2025

Walkthrough

Added network request capture to Browser.ScreenshotWithBody: records per-request metadata and simplified error types during page load, returns []NetworkRequest alongside screenshot and HTML; Runner.analyze now collects that slice and stores it in Result.LinkRequest; new NetworkRequest type and getSimpleErrorType helper added.

Changes

Cohort / File(s) Change Summary
Browser network capture & screenshot
runner/headless.go
Added public NetworkRequest struct; changed Browser.ScreenshotWithBody signature to return []NetworkRequest; enabled DevTools Network domain; added concurrent handlers for WillBeSent/Response/Finished/Failed to build/sync requests list; auto-accept JS dialogs; apply extra headers; added getSimpleErrorType helper; introduced synchronized map/slice utilities.
Runner analysis update
runner/runner.go
Capture third return value from ScreenshotWithBody into local linkRequest []NetworkRequest and include it as LinkRequest in the returned Result.
Result type update
runner/types.go
Added LinkRequest []NetworkRequest field to Result with JSON/CSV/mapstructure tag "link_request".

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Runner
    participant Browser
    participant DevTools as DevTools/Network

    Caller->>Runner: analyze(...)
    Runner->>Browser: ScreenshotWithBody(url, timeout, idle, headers, fullPage)
    Browser->>DevTools: Enable Network domain
    DevTools-->>Browser: Network events: WillBeSent / ResponseReceived / LoadingFinished / LoadingFailed
    Browser->>Browser: update requestsMap (by RequestID) and append to networkRequests slice
    Browser-->>Runner: screenshot, html, [NetworkRequest], error
    Runner->>Runner: set Result.LinkRequest = received []NetworkRequest
    Runner-->>Caller: Result (includes LinkRequest)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~40 minutes

In my burrow I tally each thread and request,
Headers like carrots, each error a jest.
I hop through events, capturing the race,
Storing small footprints of each webbed chase.
A rabbit's small ledger—network logs at rest. 🐇✨

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add network logging" is concise and accurately reflects the primary change in the PR—adding NetworkRequest capture and exposing network activity (changes to ScreenshotWithBody and Result.LinkRequest)—so it clearly summarizes the main intent of the changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
runner/runner.go (1)

1835-1841: Move variable declaration closer to usage

The linkRequest variable is declared here but only used within the screenshot block starting at line 2191. Consider moving the declaration closer to where it's actually used for better code readability.

 var (
 	serverResponseRaw  string
 	request            string
 	rawResponseHeaders string
 	responseHeaders    map[string]interface{}
-	linkRequest        []NetworkRequest
 )

Then declare it within or just before the screenshot block:

+var linkRequest []NetworkRequest
 if scanopts.Screenshot {
 	var err error
 	screenshotBytes, headlessBody, linkRequest, err = r.browser.ScreenshotWithBody(fullURL, scanopts.ScreenshotTimeout, scanopts.ScreenshotIdle, r.options.CustomHeaders, scanopts.IsScreenshotFullPage())
runner/headless.go (4)

138-138: Fix typo in error message

There's a typo: "RESSOURCE" should be "RESOURCE".

-			ErrorType:  "QUIT_BEFORE_RESSOURCE_LOADING_END",
+			ErrorType:  "QUIT_BEFORE_RESOURCE_LOADING_END",

165-165: Remove non-English comment

The comment is in French. Please use English for consistency with the rest of the codebase.

-			req.StatusCode = 0 // Marquer comme échec
+			req.StatusCode = 0 // Mark as failed

167-169: Remove redundant URL scheme check

This HTTP(S) check is redundant as the same check is already performed in the NetworkRequestWillBeSent handler (lines 127-129). Since only HTTP(S) requests are added to requestsMap, this check is unnecessary.

-			if strings.HasPrefix(req.URL, "http://") || strings.HasPrefix(req.URL, "https://") {
-				networkRequests = append(networkRequests, *req)
-			}
+			networkRequests = append(networkRequests, *req)

112-207: Consider adding request filtering and limits

The current implementation captures all network requests without any filtering or limits. For pages with many resources, this could result in:

  • Large memory usage
  • Excessive data in the output
  • Performance impact

Consider adding options to:

  • Filter by request type (e.g., only capture failed requests, or specific resource types)
  • Limit the number of requests captured
  • Add request size information

Example enhancement:

type NetworkCaptureOptions struct {
    MaxRequests      int
    CaptureOnlyErrors bool
    ResourceTypes    []string // e.g., ["document", "xhr", "fetch"]
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbba454 and 25caac8.

📒 Files selected for processing (3)
  • runner/headless.go (5 hunks)
  • runner/runner.go (3 hunks)
  • runner/types.go (1 hunks)
🧬 Code Graph Analysis (3)
runner/types.go (1)
runner/headless.go (1)
  • NetworkRequest (19-26)
runner/runner.go (1)
runner/headless.go (1)
  • NetworkRequest (19-26)
runner/headless.go (1)
common/httpx/response.go (1)
  • Response (12-30)
🧰 Additional context used
🧬 Code Graph Analysis (3)
runner/types.go (1)
runner/headless.go (1)
  • NetworkRequest (19-26)
runner/runner.go (1)
runner/headless.go (1)
  • NetworkRequest (19-26)
runner/headless.go (1)
common/httpx/response.go (1)
  • Response (12-30)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
runner/headless.go (1)

19-26: Address the duplicate field name issue.

This is the same issue previously identified - the struct has a field named NetworkRequest which duplicates the struct name and creates confusion.

🧹 Nitpick comments (1)
runner/headless.go (1)

126-140: Fix typo in error message.

There's a typo in the error type string on line 137.

-		ErrorType:  "QUIT_BEFORE_RESSOURCE_LOADING_END",
+		ErrorType:  "QUIT_BEFORE_RESOURCE_LOADING_END",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25caac8 and 5e26986.

📒 Files selected for processing (1)
  • runner/headless.go (4 hunks)
🧬 Code Graph Analysis (1)
runner/headless.go (1)
common/httpx/response.go (1)
  • Response (12-30)
🧰 Additional context used
🧬 Code Graph Analysis (1)
runner/headless.go (1)
common/httpx/response.go (1)
  • Response (12-30)
🔇 Additional comments (5)
runner/headless.go (5)

7-7: LGTM! Import addition is appropriate.

The sync import is correctly added to support the mutex used for thread-safe network request handling.


112-112: Method signature change looks appropriate.

The addition of []NetworkRequest as a return value properly extends the functionality to include network logging data.


118-124: Network setup and data structures are well-designed.

The network domain enabling and data structure setup (mutex, map, slice) provide a solid foundation for concurrent network request tracking.


126-171: Network event handlers are well-implemented with proper concurrency.

The concurrent event handlers with mutex protection correctly handle the different phases of network requests. The URL filtering and status code tracking logic is sound.


214-264: Excellent error classification implementation.

The getSimpleErrorType function provides comprehensive error categorization with a logical hierarchy: blocked reasons first, then error text patterns, then error types, with appropriate fallbacks. This will greatly help with debugging and monitoring network issues.

@ehsandeep ehsandeep changed the base branch from main to dev July 23, 2025 18:16
@Mzack9999 Mzack9999 self-requested a review September 12, 2025 10:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
runner/headless.go (4)

125-167: Ensure EachEvent listeners are stopped to avoid goroutine leaks

Capture and defer the stop functions so listeners terminate deterministically on return/early-errors, not just when page.Close happens.

-	go page.EachEvent(func(e *proto.NetworkRequestWillBeSent) { ... })()
+	stopReqSent := page.EachEvent(func(e *proto.NetworkRequestWillBeSent) { ... })
+	go stopReqSent()
+	defer stopReqSent()
# repeat for ResponseReceived, LoadingFinished, LoadingFailed

191-205: Consider waiting for network-idle before returning on errors

Early returns (Navigate/WaitLoad/Screenshot failures) may miss tail events still in flight. Optionally wait for brief network-idle (e.g., page.WaitRequestIdle) before capturing networkRequests.Slice on error paths to improve completeness.


21-27: Stabilize serialization with struct tags (JSON/CSV) and add a brief doc comment

To keep output schema stable and friendly for CSV/JSON, add tags now that this is exported and included in results.

-type NetworkRequest struct {
-	RequestID  string
-	URL        string
-	Method     string
-	StatusCode int
-	ErrorType  string
-}
+// NetworkRequest holds per-request metadata captured during a page load.
+type NetworkRequest struct {
+	RequestID  string `json:"request_id" csv:"request_id" mapstructure:"request_id"`
+	URL        string `json:"url" csv:"url" mapstructure:"url"`
+	Method     string `json:"method" csv:"method" mapstructure:"method"`
+	StatusCode int    `json:"status_code" csv:"status_code" mapstructure:"status_code"`
+	ErrorType  string `json:"error_type" csv:"error_type" mapstructure:"error_type"`
+}

220-270: Minor: normalize and extend error classification

Optional improvements:

  • Normalize errorText once (to avoid case sensitivity drift).
  • Consider adding common cases like ERR_TOO_MANY_REDIRECTS, ERR_HTTP2_PROTOCOL_ERROR, ERR_NETWORK_CHANGED.
  • Promote ErrorType strings to typed consts for compile-time safety.

Low priority; current mapping is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7690a1f and 858b155.

📒 Files selected for processing (2)
  • runner/headless.go (3 hunks)
  • runner/runner.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • runner/runner.go
🧰 Additional context used
🪛 GitHub Check: Lint Test
runner/headless.go

[failure] 137-137:
Error return value of requestsMap.Set is not checked (errcheck)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: release-test
  • GitHub Check: Analyze (go)
  • GitHub Check: Functional Test (macOS-latest)
  • GitHub Check: Functional Test (ubuntu-latest)
  • GitHub Check: Functional Test (windows-latest)
🔇 Additional comments (1)
runner/headless.go (1)

176-184: Double-check SetExtraHeaders call; set headers in one call and handle error

Set all headers in a single call (either map[string]string or a []string key/value slice depending on your go-rod version) and handle the returned error/cleanup instead of calling SetExtraHeaders per header. (runner/headless.go:176-184)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
runner/headless.go (1)

176-184: SetExtraHeaders is called repeatedly; only the last header survives

SetExtraHeaders replaces headers; looping per header drops previous ones. Build the list once and call SetExtraHeaders a single time.

-	for _, header := range headers {
-		headerParts := strings.SplitN(header, ":", 2)
-		if len(headerParts) != 2 {
-			continue
-		}
-		key := strings.TrimSpace(headerParts[0])
-		value := strings.TrimSpace(headerParts[1])
-		_, _ = page.SetExtraHeaders([]string{key, value})
-	}
+	headerPairs := make([]string, 0, len(headers)*2)
+	for _, header := range headers {
+		headerParts := strings.SplitN(header, ":", 2)
+		if len(headerParts) != 2 {
+			continue
+		}
+		key := strings.TrimSpace(headerParts[0])
+		value := strings.TrimSpace(headerParts[1])
+		headerPairs = append(headerPairs, key, value)
+	}
+	if len(headerPairs) > 0 {
+		_, _ = page.SetExtraHeaders(headerPairs)
+	}
♻️ Duplicate comments (2)
runner/headless.go (2)

122-166: Fix event subscription: handlers are immediately canceled and request tracking races on shared pointers

  • go page.EachEvent(... )() registers the handler then instantly calls the returned stop() in a goroutine, effectively unsubscribing almost immediately. You’ll miss nearly all events.
  • Concurrent mutation of shared *NetworkRequest across handlers causes data races; also Has→Get is TOCTOU and Set errors are ignored.

Refactor to:

  • keep subscriptions alive with stop := page.EachEvent(...); defer stop()
  • store NetworkRequest by value in the map, mutate a copy, then Set it back while checking the error
  • use Get only (no Has)

Apply this diff (pattern shown for all handlers):

-	networkRequests := sliceutil.NewSyncSlice[NetworkRequest]()
-	requestsMap := mapsutil.NewSyncLockMap[string, *NetworkRequest]()
+	networkRequests := sliceutil.NewSyncSlice[NetworkRequest]()
+	requestsMap := mapsutil.NewSyncLockMap[string, NetworkRequest]()

 	// Intercept outbound requests
-	go page.EachEvent(func(e *proto.NetworkRequestWillBeSent) {
+	stopWillBeSent := page.EachEvent(func(e *proto.NetworkRequestWillBeSent) {
 		if !stringsutil.HasPrefixAnyI(e.Request.URL, "http://", "https://") {
 			return
 		}
-		req := &NetworkRequest{
+		req := NetworkRequest{
 			RequestID:  string(e.RequestID),
 			URL:        e.Request.URL,
 			Method:     e.Request.Method,
 			StatusCode: -1,
 			ErrorType:  "QUIT_BEFORE_RESOURCE_LOADING_END",
 		}
-		_ = requestsMap.Set(string(e.RequestID), req)
-	})()
+		if err := requestsMap.Set(string(e.RequestID), req); err != nil {
+			// optionally log
+		}
+	})
+	defer stopWillBeSent()

 	// Intercept inbound responses
-	go page.EachEvent(func(e *proto.NetworkResponseReceived) {
-		if requestsMap.Has(string(e.RequestID)) {
-			req, _ := requestsMap.Get(string(e.RequestID))
-			req.StatusCode = e.Response.Status
-		}
-	})()
+	stopRespReceived := page.EachEvent(func(e *proto.NetworkResponseReceived) {
+		if req, ok := requestsMap.Get(string(e.RequestID)); ok {
+			req.StatusCode = e.Response.Status
+			_ = requestsMap.Set(string(e.RequestID), req)
+		}
+	})
+	defer stopRespReceived()

 	// Intercept network end requests
-	go page.EachEvent(func(e *proto.NetworkLoadingFinished) {
-		if requestsMap.Has(string(e.RequestID)) {
-			req, _ := requestsMap.Get(string(e.RequestID))
-			if req.StatusCode > 0 {
-				req.ErrorType = ""
-			}
-			networkRequests.Append(*req)
-		}
-	})()
+	stopLoadFinished := page.EachEvent(func(e *proto.NetworkLoadingFinished) {
+		if req, ok := requestsMap.Get(string(e.RequestID)); ok {
+			if req.StatusCode > 0 {
+				req.ErrorType = ""
+			}
+			networkRequests.Append(req)
+		}
+	})
+	defer stopLoadFinished()

 	// Intercept failed request
-	go page.EachEvent(func(e *proto.NetworkLoadingFailed) {
-		if requestsMap.Has(string(e.RequestID)) {
-			req, _ := requestsMap.Get(string(e.RequestID))
-			req.StatusCode = 0 // mark to zero
-			req.ErrorType = getSimpleErrorType(e.ErrorText, string(e.Type), string(e.BlockedReason))
-			if stringsutil.HasPrefixAnyI(req.URL, "http://", "https://") {
-				networkRequests.Append(*req)
-			}
-		}
-	})()
+	stopLoadFailed := page.EachEvent(func(e *proto.NetworkLoadingFailed) {
+		if req, ok := requestsMap.Get(string(e.RequestID)); ok {
+			req.StatusCode = 0 // mark to zero
+			req.ErrorType = getSimpleErrorType(e.ErrorText, string(e.Type), string(e.BlockedReason))
+			if stringsutil.HasPrefixAnyI(req.URL, "http://", "https://") {
+				networkRequests.Append(req)
+			}
+			_ = requestsMap.Set(string(e.RequestID), req)
+		}
+	})
+	defer stopLoadFailed()

 	// Handle any popup dialogs
-	go page.EachEvent(func(e *proto.PageJavascriptDialogOpening) {
+	stopJSDialog := page.EachEvent(func(e *proto.PageJavascriptDialogOpening) {
 		_ = proto.PageHandleJavaScriptDialog{
 			Accept:     true,
 			PromptText: "",
 		}.Call(page)
-	})()
+	})
+	defer stopJSDialog()

Also applies to: 168-174


122-137: Minor: check Set errors to satisfy errcheck

Even if you opt to keep pointer semantics (not recommended), don’t discard Set errors; errcheck will fail.

If you don’t apply the larger refactor above, at least:

-		_ = requestsMap.Set(string(e.RequestID), req)
+		if err := requestsMap.Set(string(e.RequestID), req); err != nil {
+			// optionally log
+		}
🧹 Nitpick comments (2)
runner/headless.go (2)

21-27: Optional: add struct tags for stable JSON/CSV keys

Result.LinkRequest is serialized; tags avoid breaking consumers and control casing.

 type NetworkRequest struct {
-	RequestID  string
-	URL        string
-	Method     string
-	StatusCode int
-	ErrorType  string
+	RequestID  string `json:"request_id" csv:"request_id" mapstructure:"request_id"`
+	URL        string `json:"url"        csv:"url"        mapstructure:"url"`
+	Method     string `json:"method"     csv:"method"     mapstructure:"method"`
+	StatusCode int    `json:"status"     csv:"status"     mapstructure:"status"`
+	ErrorType  string `json:"error_type" csv:"error_type" mapstructure:"error_type"`
 }

140-166: Minor: avoid Has→Get TOCTOU

Use Get’s ok result directly.

-		if requestsMap.Has(string(e.RequestID)) {
-			req, _ := requestsMap.Get(string(e.RequestID))
+		if req, ok := requestsMap.Get(string(e.RequestID)); ok {
 			req.StatusCode = e.Response.Status
-		}
+			// if using value semantics, re-Set here
+		}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 858b155 and bf076a4.

📒 Files selected for processing (1)
  • runner/headless.go (3 hunks)
🔇 Additional comments (3)
runner/headless.go (3)

191-213: Return ordering LGTM; propagate collected requests on errors

Returning networkRequests.Slice alongside errors is good for observability.


220-270: Error classifier looks reasonable

Mappings cover common cases; fallback behavior is sensible.


119-121: No change required — page.EnableDomain(proto.NetworkEnable{}) is valid. The rod v0.116.2 API exposes Page.EnableDomain(method proto.Request) and proto.NetworkEnable provides Call(c Client), so the current usage is supported. (pkg.go.dev)

@Mzack9999
Copy link
Member

@coderabbitai resolve
@coderabbitai create an issue for this PR

@coderabbitai
Copy link

coderabbitai bot commented Sep 12, 2025

🧩 Analysis chain

@Mzack9999 I'll resolve all the review comments for this PR and create a GitHub issue as requested.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for the network logging PR
gh issue create \
  --title "Network logging functionality implementation follow-up" \
  --body "This issue tracks the network logging functionality that was implemented in PR #2220.

## Summary
The network logging feature adds the ability to capture and log network requests made during headless browser operations. This enhancement provides visibility into network-related operations by logging specific requests.

## Key Changes Implemented
- Added NetworkRequest struct with fields for RequestID, URL, Method, StatusCode, and ErrorType
- Modified Browser.ScreenshotWithBody to capture network activity and return network request data
- Implemented network event handlers for request lifecycle tracking
- Added error classification for network failures

## Reference
- Original PR: https://github.com/projectdiscovery/httpx/pull/2220
- Requested by: @Mzack9999

This issue can be used to track any follow-up work, improvements, or issues related to the network logging functionality." \
  --assignee Mzack9999

Length of output: 53


@Mzack9999 Both tasks have been completed successfully:

Review comments resolved: All my review comments for this PR will be resolved.

GitHub issue created: Issue #2268 has been created to track the network logging functionality implementation and any follow-up work. The issue has been assigned to you as requested.

The issue provides a comprehensive summary of the network logging feature implemented in this PR and can be used for tracking any future improvements or related work.

✅ Actions performed

Comments resolved.

@Mzack9999 Mzack9999 linked an issue Sep 12, 2025 that may be closed by this pull request
@Mzack9999 Mzack9999 merged commit 6bc51b4 into projectdiscovery:dev Sep 12, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network logging functionality implementation follow-up

2 participants