-
Notifications
You must be signed in to change notification settings - Fork 394
refactor!: Split BrowserType
literal into two different literals based on context
#1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Two similar, but different contexts are `Playwright` and browser fingerprints
This is slightly breaking change, so probably wait until more braking changes accumulate. |
New release is coming, so let's add this change in now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, could you please resolve conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors browser type handling by separating Playwright context types (chromium
, firefox
, webkit
) from fingerprinting context types (chrome
, firefox
, safari
, edge
), and updates all related code, tests, and documentation to use explicit mapping.
- Introduced
fingerprint_browser_type_from_playwright_browser_type
and updatedSupportedBrowserType
literal. - Updated
HeaderGenerator
,BrowserforgeAdapter
, and crawler code to use the new fingerprint types. - Revised tests and docs/examples to replace
chromium
/webkit
withchrome
/safari
.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tests/unit/fingerprint_suite/test_header_generator.py | Updated parameterized browser list and test calls from 'chromium'/'webkit' to 'chrome'/'safari' . |
tests/unit/crawlers/_playwright/test_playwright_crawler.py | Swapped 'chromium' to 'chrome' in fingerprint generator options and keyword assertions. |
src/crawlee/fingerprint_suite/_types.py | Changed SupportedBrowserType literal to fingerprint context types. |
src/crawlee/fingerprint_suite/_header_generator.py | Added mapping function, updated defaults and validation sets to new types. |
src/crawlee/fingerprint_suite/_consts.py | Renamed keys in BROWSER_TYPE_HEADER_KEYWORD to match fingerprint types. |
src/crawlee/fingerprint_suite/_browserforge_adapter.py | Refactored adapter logic to use new fingerprint types and updated patched generator. |
src/crawlee/crawlers/_playwright/_playwright_crawler.py | Integrated mapping function for header generator options. |
src/crawlee/browsers/_playwright_browser_controller.py | Applied type mapping when generating sec-ch-ua headers. |
docs/upgrading/upgrading_to_v1.md | Documented breaking change for browser literal contexts. |
docs/examples/code_examples/playwright_crawler_with_fingerprint_generator.py | Updated example from ['chromium'] to ['chrome'] . |
assert user_agent in get_available_header_values(header_network, {'user-agent', 'User-Agent'}) | ||
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD[browser_type]), user_agent | ||
assert user_agent in get_available_header_values(header_network, {'user-agent', 'User-Agent'}), user_agent | ||
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD['chrome']), user_agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test statically checks Chrome keywords regardless of the actual browser under test. It should derive the fingerprint browser type (e.g., via fingerprint_browser_type_from_playwright_browser_type
) and use that key for BROWSER_TYPE_HEADER_KEYWORD
.
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD['chrome']), user_agent | |
assert any(keyword in user_agent for keyword in BROWSER_TYPE_HEADER_KEYWORD[browser_type]), user_agent |
Copilot uses AI. Check for mistakes.
return single_name | ||
# select from, so narrowing it to any of them is still a valid action as we are going to pick just one anyway. | ||
|
||
if isinstance(browser, Iterable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings are iterable and will be treated as a sequence of characters here. Exclude str
from this check (e.g., isinstance(browser, Iterable) and not isinstance(browser, str)
) to avoid iterating over individual characters.
if isinstance(browser, Iterable): | |
if isinstance(browser, Iterable) and not isinstance(browser, str): |
Copilot uses AI. Check for mistakes.
Description
Split
BrowserType
literal into two different literals based on context.This avoids some confusion and some implicit string manipulation in favor of explicit name mapping between the two different literals.
In Playwright:
'chromium', 'firefox', 'webkit'
In browser fingerprints context it is :
'chrome', 'firefox', 'safari', 'edge'