Skip to content

barto95100/google-maps-scraper

Β 
Β 

Repository files navigation

πŸ—ΊοΈ Google Maps Scraper API

A high-performance FastAPI service for scraping Google Maps data with parallel processing and comprehensive data extraction. Ideal for n8n users and automation workflows.

Docker Python FastAPI


✨ Key Features

  • ⚑ 3x faster with parallel processing (3 workers)
  • πŸ“Š Two extraction modes: Quick list or complete details
  • πŸ“ž Full contact info: Phone, website, address, opening hours
  • πŸ›‘οΈ Anti-detection: Random delays, realistic user-agent, WebDriver masking
  • 🐳 Docker optimized: shm_size, single-process mode for stability
  • 🌍 Multi-language support: en, fr, es, de, and more
  • πŸ”„ n8n compatible: Ready-to-use with automation workflows

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • 2 GB RAM minimum
  • Port 8001 available

Installation

# Clone the repository
git clone https://github.com/conor-is-my-name/google-maps-scraper.git
cd google-maps-scraper

# Start with Docker
docker compose up -d

# Test the API
curl "http://localhost:8001/health"

Expected response:

{
  "status": "healthy",
  "service": "google-maps-scraper"
}

πŸ“– API Usage

Endpoints

  • GET /scrape-get - Main endpoint for scraping (recommended for n8n)
  • POST /scrape - Alternative POST endpoint with JSON body
  • GET /health - Health check endpoint
  • GET / - Service information

Parameters

Parameter Type Default Required Description
query string - βœ… Search query (e.g., "hotels in 98392")
max_places int 10 ❌ Maximum results (1-100)
lang string "en" ❌ Language code (en, fr, es, de, etc.)
headless bool true ❌ Run browser in headless mode
details bool false ❌ Extract full details (phone, website, etc.)

🎯 Extraction Modes

Quick Mode (details=false)

Fast extraction of basic information - ~2 seconds per place

Returns:

  • βœ… Name
  • βœ… URL
  • βœ… Rating
  • βœ… Review count
  • βœ… Category
curl "http://localhost:8001/scrape-get?query=restaurant%20paris&max_places=20&details=false"

Use cases: Rankings, comparisons, quick lists


Full Details Mode (details=true)

Complete extraction including contact information - ~2-3 seconds per place

Returns:

  • βœ… Name, URL, Rating, Review count, Category
  • βœ… Phone number
  • βœ… Website
  • βœ… Full address
  • βœ… Opening hours
curl "http://localhost:8001/scrape-get?query=restaurant%20paris&max_places=10&details=true"

Use cases: Directory creation, CRM import, contact lists


πŸ“Š Performance

Places Quick Mode Full Details Sequential (old)
5 ~10s ~15s ~45s
10 ~20s ~30s ~90s
20 ~40s ~60s ~180s

Speedup: 2-3x faster than sequential scraping thanks to 3 parallel workers.


πŸ’‘ Example Requests

Quick List (Basic Info)

# Get 20 restaurants with ratings
curl "http://localhost:8001/scrape-get?query=restaurant%20paris&max_places=20&details=false&lang=en"

Response:

{
  "success": true,
  "query": "restaurant paris",
  "total_results": 20,
  "results": [
    {
      "name": "Le Meurice",
      "url": "https://www.google.com/maps/place/Le+Meurice/...",
      "rating": "4.8",
      "reviews_count": "1234",
      "category": "French restaurant"
    }
  ]
}

Full Details (Complete Info)

# Get 10 restaurants with contact details
curl "http://localhost:8001/scrape-get?query=restaurant%20paris&max_places=10&details=true&lang=fr"

Response:

{
  "success": true,
  "query": "restaurant paris",
  "total_results": 10,
  "results": [
    {
      "name": "Le Meurice",
      "url": "https://www.google.com/maps/place/Le+Meurice/...",
      "rating": "4.8",
      "reviews_count": "1234",
      "category": "French restaurant",
      "phone": "+33 1 44 58 10 10",
      "website": "https://www.dorchestercollection.com/paris/le-meurice",
      "address": "228 Rue de Rivoli, 75001 Paris",
      "hours": "Open Β· Closes 10:30 pm"
    }
  ]
}

πŸ”— n8n Integration

HTTP Request Node Configuration

Method: GET
URL: http://gmaps_scraper_api_service:8001/scrape-get

Query Parameters:
  - query: {{ $json.search_query }}
  - max_places: 20
  - details: true
  - lang: en
  - headless: true

Workflow Example

  1. Trigger β†’ Schedule or Webhook
  2. HTTP Request β†’ Google Maps Scraper
  3. Code β†’ Parse and filter results
  4. Database β†’ Store in PostgreSQL/MySQL
  5. Notification β†’ Send to Slack/Email

Designed for: n8n-autoscaling


πŸ‹ Docker Commands

# Start service
docker compose up -d

# View logs
docker compose logs -f

# Stop service
docker compose down

# Rebuild after changes
docker compose down
docker compose build --no-cache
docker compose up -d

# Quick restart
docker compose restart

βš™οΈ Configuration

Docker Compose Settings

docker-compose.yml includes optimizations for stability:

services:
  gmaps_scraper_api_service:
    shm_size: '2gb'          # Prevents "Page crashed" errors
    environment:
      - DISPLAY=:99          # Xvfb display
      - PYTHONUNBUFFERED=1
    volumes:
      - /dev/shm:/dev/shm    # Shared memory for Chromium
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G

Modify Worker Count

To adjust parallel processing (default: 3 workers):

Edit gmaps_scraper_server/scraper.py line ~217:

max_workers=3  # Increase to 5 max (higher = risk of detection)

⚠️ Warning: More than 5 workers may trigger Google's bot detection.


πŸ›‘οΈ Anti-Detection Features

The scraper includes multiple protections:

  • βœ… Realistic User-Agent (Chrome 120)
  • βœ… WebDriver masking (navigator.webdriver = undefined)
  • βœ… Random delays between actions (1-3 seconds)
  • βœ… Chromium stealth arguments (--disable-blink-features=AutomationControlled)
  • βœ… Limited concurrency (3 workers max for normal behavior)

Recommended limits:

  • Max 500 places/day per IP
  • Min 60 seconds between requests
  • Max 3-5 parallel workers

πŸ”§ Local Development

Without Docker

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Run the API
uvicorn gmaps_scraper_server.main_api:app --reload --host 0.0.0.0 --port 8001

The API will be available at http://localhost:8001

With Docker (recommended)

docker compose up --build

πŸ“ Project Structure

google-maps-scraper/
β”œβ”€β”€ docker-compose.yml          # Docker configuration
β”œβ”€β”€ Dockerfile                  # Docker image build
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ start.sh                    # Container startup script
β”œβ”€β”€ gmaps_scraper_server/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main_api.py            # FastAPI endpoints
β”‚   └── scraper.py             # Scraping logic (parallel)
└── debug_screenshots/          # Debug screenshots

πŸ› Troubleshooting

"Page crashed" Error

Cause: Insufficient shared memory

Solution:

# In docker-compose.yml
shm_size: '2gb'  # Increase to 4gb if needed

No Results Returned

Possible causes:

  1. Google structure changed β†’ Update CSS selectors
  2. CAPTCHA detected β†’ Reduce workers, wait 24h
  3. Invalid query β†’ Check URL encoding

View Logs

docker compose logs -f

⚠️ Important Notes

  • ⚠️ Rate limiting: Respect Google's terms. Max 500 places/day recommended.
  • ⚠️ Legal compliance: Check local laws regarding web scraping.
  • ⚠️ Responsible use: This tool is for educational and legitimate business purposes.
  • ⚠️ No guarantees: Google may change their structure at any time.

πŸ“ˆ Changelog

v2.0.0 (Latest)

  • ✨ Added parallel processing (3 workers) - 3x faster
  • ✨ Added full details mode (phone, website, address, hours)
  • ✨ Added random delays for anti-detection
  • πŸ”§ Fixed "Page crashed" with --single-process
  • πŸ”§ Improved data extraction reliability
  • πŸ“š Complete documentation and examples

v1.0.0

  • πŸŽ‰ Initial release with basic scraping

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

MIT License - Free to use, modify, and distribute.


πŸ™ Credits

Original project structure by @conor-is-my-name

Enhancements:

  • Parallel processing implementation
  • Full details extraction mode
  • Anti-detection improvements
  • Docker optimizations
  • Comprehensive documentation

πŸ“§ Support


Built with ❀️ using FastAPI and Playwright

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 95.2%
  • Dockerfile 4.8%