Skip to content

derekmeegan/browserbase-lambda-playwright

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Serverless Browser Agents with Playwright + Lambda + Browserbase

Spin up headless browsers on AWS in under a minuteβ€”no layers, no EC2, no pain.

Build License

Star ⭐ this repo if it saves you hours, and hit Fork to make it yours in seconds.

⚑ TL;DR Quick-Start

Option A: Local Deployment

# 1. Clone this repository
git clone https://github.com/your-username/browserbase-lambda-playwright.git
cd browserbase-lambda-playwright

# 2. Deploy infrastructure
env | grep AWS || export AWS_ACCESS_KEY_ID=... && export AWS_SECRET_ACCESS_KEY=...
cd infra && pip install -r requirements.txt && cdk deploy --all --require-approval never

# 3. Fetch API details from CloudFormation outputs
echo "export API_ENDPOINT_URL=$(aws cloudformation describe-stacks \
  --stack-name BrowserbaseLambdaStack \
  --query 'Stacks[0].Outputs[?OutputKey==`ApiEndpointUrl`].OutputValue' \
  --output text)"

echo "export API_KEY=$(aws apigateway get-api-key \
  --api-key $(aws cloudformation describe-stacks --stack-name BrowserbaseLambdaStack \
      --query 'Stacks[0].Outputs[?OutputKey==`ApiKeyId`].OutputValue' --output text) \
  --include-value \
  --query 'value' \
  --output text)"

# 4. Install example dependencies and run quick start
pip install -r examples/requirements.txt
python examples/quick_start.py

Option B: GitHub Actions Deployment

# 1. Fork or push this repo to your GitHub account
# 2. Add repository secrets under Settings β†’ Secrets & variables β†’ Actions:
#    - AWS_ACCESS_KEY
#    - AWS_SECRET_ACCESS_KEY
# 3. Create Browserbase secrets in AWS Secrets Manager (see infra/stack.py env names)
# 4. Push to main β†’ GitHub Actions triggers CDK deploy

You now have a Lambda that opens a Browserbase session and runs Playwright code from lambdas/scraper/scraper.py.
Invoke it with:

curl -X POST "$API_ENDPOINT_URL" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{"url":"https://news.ycombinator.com/"}' \
  -v

# …then poll status:
curl -H "x-api-key: $API_KEY" "$API_ENDPOINT_URL/<jobId>"

OR

pip install -r examples/requirements.txt
python examples/quick_start.py

πŸ”„ Serverless Async Architecture

  1. POST /scrape returns 202 Accepted immediately.
  2. Job metadata is stored in DynamoDB (JobStatusTable) with status updates (PENDINGΒ β†’ RUNNINGΒ β†’ SUCCESS/FAILED).
  3. GET /scrape/{jobId} polls DynamoDB for the latest job result.

πŸš€ Why use this template?

  • Zero binary juggling – Playwright lives in the Lambda image; Chrome runs remotely on Browserbase.
  • Cold-start β‰ˆΒ 2Β s – no browser download, just connect-over-CDP.
  • Pay-per-run – pure Lambda pricing; scale by upgrading Browserbase, not infra.
  • Async, serverless – fire-and-forget POST, durable job tracking via DynamoDB.
  • Built-in CI/CD – GitHub Actions deploys on every push to main/staging.

πŸ—οΈ High-Level Architecture

     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  CDP (WebSocket)  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚ AWS Lambda β”‚ ────────────────▢ β”‚ Browserbaseβ”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚              Logs
           β–Ό
     AWS CloudWatch
           β”‚
           β–Ό
     Amazon DynamoDB (JobStatusTable)

πŸ“¦ Project Layout

.
β”œβ”€β”€ .github/workflows/deploy.yaml
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ quick_start.py
β”‚   └── requirements.txt
β”œβ”€β”€ infra/
β”‚   β”œβ”€β”€ app.py
β”‚   β”œβ”€β”€ cdk.json
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── stack.py
β”œβ”€β”€ lambdas/
β”‚   β”œβ”€β”€ getter/
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   β”œβ”€β”€ getter.py
β”‚   β”‚   └── requirements.txt
β”‚   └── scraper/
β”‚       β”œβ”€β”€ Dockerfile
β”‚       β”œβ”€β”€ scraper.py
β”‚       └── requirements.txt
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
└── LICENSE
πŸ” Full Setup & Prerequisites

Requirements

Tool Version
AWS CLI any 2.x
Docker β‰₯ 20.10
Node & npm any LTS
Python 3.12+
Browserbase account free tier OK

1. Install the AWS CLI

# macOS (Homebrew)
brew install awscli

(See AWS docs for Windows/Linux.)

2. Configure AWS

aws configure  # supply keys & default region, e.g. us-east-1

3. Add Browserbase secrets to AWS Secrets Manager

aws secretsmanager create-secret \
  --name BrowserbaseLambda/BrowserbaseApiKey \
  --secret-string '{"BROWSERBASE_API_KEY":"$BROWSERBASE_API_KEY"}'

aws secretsmanager create-secret \
  --name BrowserbaseLambda/BrowserbaseProjectId \
  --secret-string '{"BROWSERBASE_PROJECT_ID":"$BROWSERBASE_PROJECT_ID"}'

4. (Optional) Local Playwright install

pip install playwright && python -m playwright install

❓ FAQ

Question Answer
Browserbase free tier? Yesβ€”1 concurrent session; creation rate‑limited.
Cold‑starts? Typical <Β 2β€―s (CDP connect, no browser download).
Add extra Python libs? Add to `lambdas/<getter
API returns 202 Acceptedβ€”how to track status? Poll GET /scrape/{jobId} to read status/results from DynamoDB.

βš™οΈ Performance & Optimization (Optional)

If you need faster cold-starts or shorter CI deploys, consider:

  • Provisioned Concurrency: Keep your Lambda warm to skip container startup.

  • Browserbase Keep‑Alive: Paid sessions remove free-tier spin‑up overhead.

  • CI Caching: Use actions/cache for pip and npm in GitHub Actions to shave minutes

🀝 Contributing

Pull requests are welcome! Please open an issue first if you plan a large change.

πŸ“„ License

This project is licensed under the MIT License – see the LICENSE file for details.

About

Resilient headless browser automation on AWS Lambda using pure-Python Playwright, Docker, and Browserbase.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors