Skip to content

refactor!: Split BrowserType literal into two different literals based on context #1070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Pijukatel
Copy link
Collaborator

Description

Split BrowserType literal into two different literals based on context.

This avoids some confusion and some implicit string manipulation in favor of explicit name mapping between the two different literals.

In Playwright: 'chromium', 'firefox', 'webkit'
In browser fingerprints context it is : 'chrome', 'firefox', 'safari', 'edge'

Two similar, but different contexts are `Playwright` and browser fingerprints
@github-actions github-actions bot added this to the 110th sprint - Tooling team milestone Mar 11, 2025
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Mar 11, 2025
@Pijukatel Pijukatel requested review from vdusek and B4nan March 11, 2025 09:48
@Pijukatel Pijukatel added the adhoc Ad-hoc unplanned task added during the sprint. label Mar 13, 2025
@Pijukatel
Copy link
Collaborator Author

This is slightly breaking change, so probably wait until more braking changes accumulate.

@Pijukatel
Copy link
Collaborator Author

New release is coming, so let's add this change in now.

@Pijukatel Pijukatel marked this pull request as ready for review June 19, 2025 12:38
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, could you please resolve conflicts?

@Pijukatel Pijukatel requested a review from vdusek July 2, 2025 14:42
@B4nan B4nan requested review from Copilot and barjin and removed request for B4nan July 2, 2025 17:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors browser type handling by separating Playwright context types (chromium, firefox, webkit) from fingerprinting context types (chrome, firefox, safari, edge), and updates all related code, tests, and documentation to use explicit mapping.

  • Introduced fingerprint_browser_type_from_playwright_browser_type and updated SupportedBrowserType literal.
  • Updated HeaderGenerator, BrowserforgeAdapter, and crawler code to use the new fingerprint types.
  • Revised tests and docs/examples to replace chromium/webkit with chrome/safari.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/fingerprint_suite/test_header_generator.py Updated parameterized browser list and test calls from 'chromium'/'webkit' to 'chrome'/'safari'.
tests/unit/crawlers/_playwright/test_playwright_crawler.py Swapped 'chromium' to 'chrome' in fingerprint generator options and keyword assertions.
src/crawlee/fingerprint_suite/_types.py Changed SupportedBrowserType literal to fingerprint context types.
src/crawlee/fingerprint_suite/_header_generator.py Added mapping function, updated defaults and validation sets to new types.
src/crawlee/fingerprint_suite/_consts.py Renamed keys in BROWSER_TYPE_HEADER_KEYWORD to match fingerprint types.
src/crawlee/fingerprint_suite/_browserforge_adapter.py Refactored adapter logic to use new fingerprint types and updated patched generator.
src/crawlee/crawlers/_playwright/_playwright_crawler.py Integrated mapping function for header generator options.
src/crawlee/browsers/_playwright_browser_controller.py Applied type mapping when generating sec-ch-ua headers.
docs/upgrading/upgrading_to_v1.md Documented breaking change for browser literal contexts.
docs/examples/code_examples/playwright_crawler_with_fingerprint_generator.py Updated example from ['chromium'] to ['chrome'].

assert user_agent in get_available_header_values(header_network, {'user-agent', 'User-Agent'})
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD[browser_type]), user_agent
assert user_agent in get_available_header_values(header_network, {'user-agent', 'User-Agent'}), user_agent
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD['chrome']), user_agent
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test statically checks Chrome keywords regardless of the actual browser under test. It should derive the fingerprint browser type (e.g., via fingerprint_browser_type_from_playwright_browser_type) and use that key for BROWSER_TYPE_HEADER_KEYWORD.

Suggested change
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD['chrome']), user_agent
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD[browser_type]), user_agent

Copilot uses AI. Check for mistakes.

return single_name
# select from, so narrowing it to any of them is still a valid action as we are going to pick just one anyway.

if isinstance(browser, Iterable):
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings are iterable and will be treated as a sequence of characters here. Exclude str from this check (e.g., isinstance(browser, Iterable) and not isinstance(browser, str)) to avoid iterating over individual characters.

Suggested change
if isinstance(browser, Iterable):
if isinstance(browser, Iterable) and not isinstance(browser, str):

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants