Skip to content

Latest commit

 

History

History
541 lines (415 loc) · 16.6 KB

File metadata and controls

541 lines (415 loc) · 16.6 KB

Enhanced CloudScraper Features Documentation

This document describes the comprehensive enhancements made to CloudScraper to bypass the majority of Cloudflare-protected websites.

🚀 Overview of Enhancements

7: The enhanced CloudScraper includes 11 major new systems that work together to provide sophisticated anti-bot detection evasion: 8: 9: 1. Hybrid Engine - The ultimate weapon: TLS-Chameleon + Py-Parkour Browser Bridge 10: 2. Enhanced TLS Fingerprinting - JA3 randomization and cipher rotation 11: 3. Advanced Anti-Detection - Traffic pattern obfuscation and payload spoofing 3. ML-Based Fingerprint Resistance - Machine learning-based detection evasion 4. Intelligent Challenge Detection - Automated challenge recognition and response 5. Adaptive Timing Algorithms - Human-like behavior simulation 6. Enhanced WebGL & Canvas Spoofing - Coordinated fingerprint generation 7. Request Signing & Payload Obfuscation - Advanced request manipulation 8. ML-Based Bypass Optimization - Learning from success/failure patterns 9. Automation Bypass - Masking Playwright/Chromium indicators 10. Behavioral Patterns - Integrated mouse/scroll simulation 11. Comprehensive Testing Framework - Full test coverage for all features 12. Enhanced Error Handling - Sophisticated retry and recovery mechanisms

� Free vs. Paid Features

CloudScraper gives you the best of both worlds: robust free tools for most cases, and optional paid integrations for extreme scenarios.

🆓 Free Features (Built-in)

These features run locally on your machine and cost nothing:

  • The Hybrid Engine: Uses your local Chrome browser via playwright to bypass challenges. No API keys required.
  • Local AI: ai_ocr.py uses local machine learning models to solve simple text/math captchas.
  • Protocol Bypasses: TLS Fingerprinting, Anti-Detection, and all core logic are 100% free.

💳 Paid Features (Optional)

These are purely optional 3rd-party integrations for solving commercially protected captchas (e.g., reCAPTCHA, Turnstile) without a browser context:

  • Captcha Solvers: Integration with 2Captcha, Anti-Captcha, CapSolver, etc. These require your own API key and subscription with those providers.

�📋 Feature Details

1. The Hybrid Engine (hybrid_engine.py)

Purpose: The most powerful bypass mechanism available, combining the speed of HTTP requests with the capability of a real browser.

Key Components:

  • TLS-Chameleon (curl_cffi): Provides low-level TLS fingerprint spoofing (JA3/JA4) that mimics real browsers perfectly at the packet level.
  • Py-Parkour (playwright): Acts as a "Browser Bridge". It remains dormant until a complex challenge is detected.
  • HybridEngine: Coordinates the handoff. If TLS-Chameleon hits a wall, HybridEngine wakes effects Py-Parkour, solves the challenge in a headless browser, extracts the cf_clearance cookie, and hands it back to the scraper.

Features:

  • Best of Both Worlds: Speed of requests + Power of Chrome.
  • Zero Configuration: Just set interpreter='hybrid'.
  • Auto-Fallback: Only uses the browser when absolutely necessary.

Usage:

scraper = cloudscraper.create_scraper(
    interpreter='hybrid',
    impersonate='chrome120'
)

2. Enhanced TLS Fingerprinting (tls_fingerprinting.py)

Purpose: Avoid TLS-based detection by rotating JA3 fingerprints and cipher suites.

Key Components:

  • JA3Generator: Creates realistic JA3 fingerprints for different browsers
  • CipherSuiteManager: Manages cipher suite rotation
  • TLSFingerprintingManager: Coordinates TLS fingerprint rotation

Features:

  • Real JA3 fingerprints from Chrome, Firefox, Safari, Edge
  • Automatic rotation based on request count
  • Browser-specific cipher suite preferences
  • TLS timing simulation

Usage:

scraper = cloudscraper.create_scraper(
    enable_tls_fingerprinting=True,
    enable_tls_rotation=True,
    browser='chrome'
)

2. Advanced Anti-Detection (anti_detection.py)

Purpose: Obfuscate traffic patterns and request characteristics to avoid pattern-based detection.

Key Components:

  • TrafficPatternObfuscator: Analyzes and obfuscates request patterns
  • BurstController: Prevents request bursts that trigger rate limits
  • RequestHeaderObfuscator: Modifies headers to avoid detection
  • PayloadObfuscator: Obfuscates request payloads and parameters

Features:

  • Request timing pattern analysis
  • Burst detection and prevention
  • Header randomization and obfuscation
  • Payload parameter manipulation
  • Tracking parameter injection

Usage:

scraper = cloudscraper.create_scraper(
    enable_anti_detection=True
)

3. ML-Based Fingerprint Resistance (advanced_fingerprinting.py)

Purpose: Use machine learning techniques to detect and evade fingerprinting attempts.

Key Components:

  • CanvasFingerprinter: Generates realistic Canvas fingerprints
  • WebGLFingerprinter: Creates WebGL fingerprints with variations
  • DeviceFingerprinter: Generates comprehensive device fingerprints
  • MLBasedFingerprintResistance: ML-based detection and evasion

Features:

  • Realistic Canvas and WebGL fingerprint generation
  • Device characteristic simulation
  • ML-based uniqueness detection
  • Adaptive fingerprint modification
  • Browser-specific variations

4. Intelligent Challenge Detection (intelligent_challenge_system.py)

Purpose: Automatically detect and respond to various Cloudflare challenge types.

Key Components:

  • IntelligentChallengeDetector: Pattern-based challenge detection
  • ChallengeResponseGenerator: Automated response generation
  • IntelligentChallengeSystem: Main coordination system

Features:

  • Pattern-based challenge recognition
  • Adaptive pattern learning
  • Multiple response strategies
  • Success rate tracking
  • Custom pattern support

Usage:

scraper = cloudscraper.create_scraper(
    enable_intelligent_challenges=True
)

# Add custom challenge pattern
scraper.intelligent_challenge_system.add_custom_pattern(
    domain='example.com',
    pattern_name='Custom Challenge',
    patterns=[r'custom.pattern'],
    challenge_type='custom',
    response_strategy='delay_retry'
)

5. Adaptive Timing Algorithms (adaptive_timing.py)

Purpose: Simulate realistic human browsing behavior through adaptive timing.

Key Components:

  • HumanBehaviorSimulator: Simulates realistic human behavior patterns
  • AdaptiveTimingController: Learns optimal timing for each domain
  • CircadianTimingAdjuster: Adjusts timing based on time of day
  • SmartTimingOrchestrator: Coordinates all timing systems

Features:

  • Multiple behavior profiles (casual, focused, research, mobile)
  • Adaptive learning from success/failure rates
  • Circadian rhythm simulation
  • Reading time estimation
  • Attention span simulation

Usage:

scraper = cloudscraper.create_scraper(
    enable_adaptive_timing=True,
    behavior_profile='casual'  # or 'focused', 'research', 'mobile'
)

6. Enhanced WebGL & Canvas Spoofing (enhanced_spoofing.py)

Purpose: Generate coordinated, realistic fingerprints for Canvas and WebGL APIs.

Key Components:

  • CanvasSpoofingEngine: Advanced Canvas fingerprint spoofing
  • WebGLSpoofingEngine: WebGL fingerprint spoofing with noise injection
  • SpoofingCoordinator: Ensures consistency between fingerprints

Features:

  • Realistic noise injection
  • Browser-specific rendering variations
  • Consistency levels (low, medium, high)
  • Domain-specific caching
  • Coordinated fingerprint generation

Usage:

scraper = cloudscraper.create_scraper(
    enable_enhanced_spoofing=True,
    spoofing_consistency_level='medium'  # or 'low', 'high'
)

7. ML-Based Bypass Optimization (ml_optimization.py)

Purpose: Learn from success/failure patterns to optimize bypass strategies.

Key Components:

  • SimpleMLOptimizer: Statistical learning from bypass attempts
  • AdaptiveStrategySelector: Selects optimal strategies based on context
  • MLBypassOrchestrator: Coordinates ML-based optimization

Features:

  • Success pattern learning
  • Context-aware strategy selection
  • Feature importance weighting
  • Domain-specific optimization
  • Strategy performance tracking

Usage:

scraper = cloudscraper.create_scraper(
    enable_ml_optimization=True
)

# Get optimization insights
report = scraper.ml_optimizer.get_optimization_report('example.com')

8. Enhanced Error Handling (enhanced_error_handling.py)

Purpose: Provide sophisticated error handling and recovery mechanisms.

Key Components:

  • ErrorClassifier: Classifies errors and determines severity
  • RetryCalculator: Calculates optimal retry delays
  • ProxyRotationManager: Manages proxy rotation for error recovery
  • SessionManager: Handles session refresh and recovery

Features:

  • Error pattern recognition
  • Adaptive retry strategies
  • Proxy failure handling
  • Session recovery
  • Error severity classification

Usage:

scraper = cloudscraper.create_scraper(
    enable_enhanced_error_handling=True
)

# Get error statistics
stats = scraper.enhanced_error_handler.get_error_statistics()

🔧 Configuration Options

Basic Enhanced Configuration

import cloudscraper

# Create scraper with all enhanced features
scraper = cloudscraper.create_scraper(
    # Core settings
    debug=True,
    browser='chrome',
    
    # Enhanced features (all enabled by default)
    enable_tls_fingerprinting=True,
    enable_anti_detection=True,
    enable_enhanced_spoofing=True,
    enable_intelligent_challenges=True,
    enable_adaptive_timing=True,
    enable_ml_optimization=True,
    enable_enhanced_error_handling=True,
    
    # Feature-specific settings
    behavior_profile='casual',
    spoofing_consistency_level='medium',
    
    # Stealth mode
    enable_stealth=True,
    stealth_options={
        'min_delay': 1.0,
        'max_delay': 4.0,
        'human_like_delays': True,
        'randomize_headers': True
    }
)

Maximum Stealth Configuration

# Maximum stealth for difficult websites
scraper = cloudscraper.create_scraper(
    debug=True,
    browser='chrome',
    
    # All enhanced features enabled
    enable_tls_fingerprinting=True,
    enable_anti_detection=True,
    enable_enhanced_spoofing=True,
    enable_intelligent_challenges=True,
    enable_adaptive_timing=True,
    enable_ml_optimization=True,
    enable_enhanced_error_handling=True,
    
    # Maximum stealth settings
    behavior_profile='research',  # Slowest, most careful
    spoofing_consistency_level='high',
    
    stealth_options={
        'min_delay': 2.0,
        'max_delay': 8.0,
        'human_like_delays': True,
        'randomize_headers': True,
        'browser_quirks': True,
        'simulate_viewport': True,
        'behavioral_patterns': True
    }
)

# Enable maximum stealth mode
scraper.enable_maximum_stealth()

9. Advanced Automation Bypass (stealth.py)

Purpose: Evade browser-engine profiling by masking automation-specific indicators.

Features:

  • Argument Injection: Comprehensive list of Chromium switches (e.g., --disable-blink-features=AutomationControlled).
  • Dynamic Masking: Injects JavaScript to spoof navigator.webdriver, chrome.runtime, and permission APIs.
  • Leak Prevention: Disables background networking and telemetry flags that signal automation.

10. Behavioral Pattern Simulation (behavioral_simulation.py)

Purpose: Mimic human-like interaction patterns to bypass behavioral analysis.

Features:

  • Interaction Hook: Integrated directly into Playwright solve loops.
  • Realistic Movements: Bezier-curve based mouse movements with jitter and natural delays.
  • Natural Scrolling: Simulates reading patterns (variable speed, back-scrolling).
  • Sync & Async Support: Works across all Playwright bypass modes.

📊 Monitoring and Statistics

Get Comprehensive Statistics

# Get enhanced statistics from all systems
stats = scraper.get_enhanced_statistics()

print("=== Enhanced CloudScraper Statistics ===")
for system, data in stats.items():
    print(f"\n{system.upper()}:")
    if isinstance(data, dict):
        for key, value in data.items():
            print(f"  {key}: {value}")
    else:
        print(f"  {data}")

Domain-Specific Optimization

# Optimize all systems for a specific domain
scraper.optimize_for_domain('example.com')

# Get domain-specific ML insights
ml_insights = scraper.ml_optimizer.get_optimization_report('example.com')
print("ML Optimization Insights:", ml_insights)

Error Monitoring

# Get error handling statistics
error_stats = scraper.enhanced_error_handler.get_error_statistics()
print("Error Statistics:", error_stats)

🧪 Testing

Run the Test Suite

# Run comprehensive test suite
python tests/test_enhanced_features.py

Manual Testing

# Run the demonstration script
python examples/enhanced_bypass_demo.py

🔄 Adaptive Learning

The enhanced CloudScraper learns from every request to improve bypass success rates:

Automatic Learning

  • Success Patterns: Learns which strategies work best for each domain
  • Timing Optimization: Adapts request timing based on success rates
  • Fingerprint Effectiveness: Tracks which fingerprints avoid detection
  • Error Recovery: Learns from errors to improve recovery strategies

Manual Optimization

# Force optimization for a domain after learning period
scraper.optimize_for_domain('difficult-site.com')

# Reset learning data if needed
scraper.reset_all_systems()

# Get optimization insights
insights = scraper.ml_optimizer.get_optimization_report('difficult-site.com')

🎯 Best Practices

1. Gradual Approach

Start with basic settings and gradually increase stealth levels:

# Start conservative
scraper = cloudscraper.create_scraper(
    behavior_profile='research',
    spoofing_consistency_level='low'
)

# If facing challenges, increase stealth
scraper.enable_maximum_stealth()

2. Domain-Specific Optimization

Let the system learn domain patterns:

# Make several requests to let the system learn
for i in range(10):
    response = scraper.get('https://target-site.com/page' + str(i))

# Then optimize
scraper.optimize_for_domain('target-site.com')

3. Monitor Statistics

Regularly check system performance:

stats = scraper.get_enhanced_statistics()
ml_stats = stats.get('ml_optimization', {})
success_rate = ml_stats.get('global_success_rate', 0)

if success_rate < 0.8:
    scraper.enable_maximum_stealth()

4. Error Handling

Use the enhanced error handling for robust operations:

try:
    response = scraper.get('https://challenging-site.com')
except Exception as e:
    error_stats = scraper.enhanced_error_handler.get_error_statistics()
    print(f"Error occurred: {e}")
    print(f"Recent errors: {error_stats['recent_errors']}")

🔧 Troubleshooting

Common Issues and Solutions

  1. High Detection Rates

    # Enable maximum stealth
    scraper.enable_maximum_stealth()
    
    # Reset fingerprints
    scraper.reset_all_systems()
  2. Slow Performance

    # Use focused behavior profile for faster requests
    scraper.timing_orchestrator.set_behavior_profile('focused')
  3. Proxy Issues

    # Check proxy statistics
    error_stats = scraper.enhanced_error_handler.get_error_statistics()
    proxy_failures = error_stats.get('proxy_failures', {})
    print("Proxy failures:", proxy_failures)
  4. Memory Usage

    # Clear caches periodically
    scraper.reset_all_systems()

🔮 Future Enhancements

Potential areas for future development:

  1. Deep Learning Models: Integration with neural networks for pattern recognition
  2. Blockchain-Based Proxies: Decentralized proxy networks
  3. Real-Time Adaptation: Faster adaptation to new protection mechanisms
  4. Cross-Domain Learning: Learn patterns across multiple domains
  5. Enhanced Captcha Solving: Integration with advanced captcha solvers

📞 Support

For issues, questions, or contributions:

  1. Check the test suite for examples: tests/test_enhanced_features.py
  2. Run the demo script: examples/enhanced_bypass_demo.py
  3. Review the statistics to understand system behavior
  4. Use the debugging features with debug=True

⚠️ Legal Notice

This enhanced CloudScraper is for educational and legitimate security testing purposes only. Users are responsible for ensuring compliance with applicable laws, terms of service, and ethical guidelines when using this software.


Enhanced CloudScraper v3.1.0+ - Advanced Cloudflare Bypass Capabilities