G2 Reviews Crawler

Crawls product reviews from G2.com and saves them as structured JSON. Uses your real Chrome browser (via CDP) so anti-bot protections are handled naturally — no proxies, cookies, or stealth tricks needed.

How It Works

┌──────────────────────────────────────────────────────────────────┐
│  1. You launch Chrome with --remote-debugging-port=9222          │
│  2. The script connects to Chrome via CDP (DevTools Protocol)    │
│  3. It navigates Chrome to each G2 review page                   │
│  4. Chrome renders the page normally (handles DataDome/anti-bot) │
│  5. The script reads the rendered HTML and parses reviews        │
│  6. Reviews are deduplicated and saved incrementally to JSON     │
└──────────────────────────────────────────────────────────────────┘

Architecture

  Script (CDP client)              Chrome Browser
  ┌──────────────────┐            ┌──────────────────┐
  │ Page.navigate    │───────────▶│ loads G2 page    │
  │ to each review   │            │ passes anti-bot  │
  │ page URL         │            │ renders HTML     │
  │                  │◀───────────│                  │
  │ reads outerHTML  │            │ port 9222        │
  └──────────────────┘            └──────────────────┘
           │
           ▼
  ┌──────────────────┐
  │ Scrapling parser │
  │ CSS selectors    │──▶ reviews.json
  │ (itemprop based) │
  └──────────────────┘

G2 uses DataDome anti-bot protection which blocks headless browsers and HTTP clients. By driving your real Chrome, the script avoids this entirely — Chrome is already a trusted browser with a valid session.

What Gets Extracted

Each review is saved with these fields (see example_output.json):

Field	Description
`review_id`	G2's internal review identifier
`author`	Reviewer name (as shown on G2)
`job_title`	Reviewer's job title
`company_size`	Company size bucket
`date`	Publication date (ISO format)
`rating`	Star rating (1.0–5.0)
`title`	Review title
`likes`	"What do you like best" section
`dislikes`	"What do you dislike" section
`problems_solving`	"What problems is it solving" section

Setup

1. Install dependencies

python -m venv .venv
source .venv/bin/activate   # on Windows: .venv\Scripts\activate
pip install -r requirements.txt

2. Launch Chrome with remote debugging

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222

# Linux
google-chrome --remote-debugging-port=9222

3. Run the crawler

python crawl_g2_reviews.py

The script will:

Connect to Chrome via CDP on port 9222
Navigate to each review page (Chrome handles anti-bot naturally)
Read the rendered HTML and parse reviews with CSS selectors
Save reviews incrementally to a JSON file (safe to interrupt and resume)

Configuration

All settings can be overridden with environment variables:

Variable	Default	Description
`G2_PRODUCT_SLUG`	`wati`	G2 product slug from the URL
`G2_OUTPUT_FILE`	`{slug}_reviews.json`	Output JSON file path
`G2_CDP_PORT`	`9222`	Chrome DevTools Protocol port
`G2_PAGE_WAIT`	`4`	Seconds to wait for page render

Example: crawl a different product

G2_PRODUCT_SLUG=slack python crawl_g2_reviews.py

Resumability

The crawler supports incremental crawling. If interrupted, just run it again — it loads previously saved reviews from the output JSON and skips duplicates based on review_id.

Files

File	Tracked	Description
`crawl_g2_reviews.py`	Yes	Main crawler script
`requirements.txt`	Yes	Python dependencies
`example_output.json`	Yes	Sample output showing the data format
`*_reviews.json`	No	Crawled review data

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.MD		README.MD
crawl_g2_reviews.py		crawl_g2_reviews.py
example_output.json		example_output.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G2 Reviews Crawler

How It Works

Architecture

What Gets Extracted

Setup

1. Install dependencies

2. Launch Chrome with remote debugging

3. Run the crawler

Configuration

Example: crawl a different product

Resumability

Files

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

G2 Reviews Crawler

How It Works

Architecture

What Gets Extracted

Setup

1. Install dependencies

2. Launch Chrome with remote debugging

3. Run the crawler

Configuration

Example: crawl a different product

Resumability

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages