Skip to content

timelessco/pdf-url-screenshot

Repository files navigation

πŸ“„ PDF Screenshot Service

Turn any PDF URL into a PNG thumbnail, stored on Cloudflare R2 and ready to serve. πŸ–ΌοΈβ˜οΈ

Ships in two flavors: a long-running 🐎 Fastify server you can self-host (e.g. on Hetzner) and a ⚑ Vercel serverless function. Same API contract on both β€” pick the one that fits your hosting.

πŸš€ Deployment options

Path What it is When to use
🐎 Fastify on Hetzner (index.ts, routes/) Long-running Fastify server, PM2-managed, runs on your own box. Dedicated host, lowest per-request latency, full Swagger UI, predictable bill.
⚑ Vercel serverless (vercel/) Vercel Function (Node 22) β€” pdfjs-dist + canvas, R2 via @aws-sdk/client-s3, optional Upstash rate-limit. Zero ops, scale-to-zero, deploy on git push. ~2–3s cold start; sub-second warm.

πŸ” Both paths render the same way (pdfjs-dist + canvas) and upload to the same R2 bucket, so a thumbnail produced by one is interchangeable with the other.

✨ Features

  • πŸ–ΌοΈ PNG thumbnail generated from the first page of any PDF URL
  • ☁️ Upload to Cloudflare R2 using the S3 API
  • πŸ” Bearer-token API authentication
  • πŸ›‘οΈ Rate limiting (built-in Fastify plugin; Upstash on Vercel)
  • πŸ“š Interactive Swagger/OpenAPI documentation (Fastify path)
  • πŸ’Ž TypeScript end-to-end

🐎 Path A β€” Fastify (Hetzner / self-host)

πŸ“‹ Prerequisites

  • Node.js 18+
  • PM2 (production)
  • Cloudflare R2 credentials

πŸ“¦ Install

git clone <repository-url>
cd pdf-url-screenshot
npm install
cp env.example .env
# edit .env with your credentials
npm run build

▢️ Run

npm run dev              # dev mode
npm run pm2:start        # production with PM2
npm run pm2:logs         # tail logs
npm run pm2:restart      # restart after deploy

Server listens on PORT (default 3000).

πŸ”§ Required environment

Variable Notes
API_KEYS Comma-separated bearer tokens (required)
R2_ACCOUNT_ID Cloudflare R2 account ID
R2_ACCESS_KEY_ID R2 access key
R2_SECRET_ACCESS_KEY R2 secret key
R2_MAIN_BUCKET_NAME R2 bucket name
R2_PUBLIC_BUCKET_URL Public bucket URL, e.g. https://media.example.com
PORT Server port (default 3000)
SERVER_URL Full server URL for Swagger (default http://localhost:3000)

πŸ”‘ Generating an API key

node -e "console.log('sk_live_' + require('crypto').randomBytes(32).toString('hex'))"

Add it to API_KEYS (comma-separated if multiple).

πŸ“š Swagger UI

Once the server is running:

http://localhost:3000/docs

Click πŸ”’ Authorize, paste your API key, and try endpoints directly.


⚑ Path B β€” Vercel serverless

The Vercel implementation lives entirely under vercel/ and is self-contained (own package.json).

πŸ› οΈ One-time setup

cd vercel
npm install
npm i -g vercel          # if you don't have the CLI yet
vercel login
vercel link --yes --project pdf-screenshot

πŸ”§ Configure environment

Required (preview + production scopes):

Variable Notes
API_KEYS Comma-separated bearer tokens
R2_ACCOUNT_ID Cloudflare R2 account ID
R2_ACCESS_KEY_ID R2 access key
R2_SECRET_ACCESS_KEY R2 secret key
R2_MAIN_BUCKET_NAME R2 bucket name
R2_PUBLIC_BUCKET_URL Public bucket URL

Optional (rate limiting β€” function no-ops with a warning if unset):

Variable Notes
UPSTASH_REDIS_REST_URL Upstash Redis REST endpoint
UPSTASH_REDIS_REST_TOKEN Upstash Redis REST token

Add each one interactively (value never enters shell history or LLM context):

vercel env add API_KEYS preview
vercel env add R2_ACCOUNT_ID preview
# …repeat for every variable above, then for `production` scope when ready

🚒 Deploy

npm run deploy           # preview deploy
npm run deploy:prod      # production deploy

πŸ”’ Vercel "Deployment Protection": by default, preview URLs require Vercel SSO. Either disable it (Project Settings β†’ Deployment Protection) before testing from external clients, or use vercel curl (the CLI handles SSO automatically).

πŸ§ͺ Local validation without deploying

cd vercel
node scripts/smoke-render.mjs           # renderer-only sanity, no R2/Vercel needed
npx tsx scripts/smoke-handler.mjs       # handler guard-branch checks
npx tsc --noEmit                        # typecheck
npm run dev                             # `vercel dev` against linked project + env

πŸ”Œ API

Both deploy paths expose the same endpoint.

POST /upload/pdf-screenshot (Fastify) / POST /api/upload/pdf-screenshot (Vercel)

πŸ“¨ Headers

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

πŸ“¦ Body

{ "url": "https://example.com/document.pdf", "userId": "alice" }

βœ… Success (200)

{
  "success": true,
  "path": "pdf_thumbnails/alice/thumb-document.png",
  "publicUrl": "https://media.example.com/pdf_thumbnails/alice/thumb-document.png"
}

❌ Errors

Code Meaning
400 Invalid request body or PDF could not be fetched
401 Missing or invalid bearer token
429 Rate limit exceeded
500 Render or upload failure (details in body)

πŸŒ€ Curl

Fastify:

curl -X POST http://localhost:3000/upload/pdf-screenshot \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/document.pdf","userId":"alice"}'

Vercel:

curl -X POST https://<your-deployment>.vercel.app/api/upload/pdf-screenshot \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/document.pdf","userId":"alice"}'

πŸ›‘οΈ Rate limiting

  • 🐎 Fastify (Hetzner) β€” @fastify/rate-limit. Global: 100 req/min/IP. PDF endpoint: 10 req/hour/IP. Localhost is whitelisted.
  • ⚑ Vercel β€” @upstash/ratelimit against Upstash Redis. Global: 100 req/min/IP. PDF endpoint: 10 req/min/IP. If Upstash env vars are unset, the limiter logs a warning and allows all traffic.

Rate-limit headers (x-ratelimit-limit, x-ratelimit-remaining, x-ratelimit-reset) are present on the Fastify path.


πŸ—‚οΈ Project structure

pdf-url-screenshot/
β”œβ”€β”€ index.ts                 # Fastify entry
β”œβ”€β”€ routes/                  # Fastify route handlers
β”‚   β”œβ”€β”€ root.ts
β”‚   └── upload/
β”‚       └── pdf-screenshot.ts
β”œβ”€β”€ schemas/                 # Shared response schemas
β”œβ”€β”€ r2Client.ts              # S3-API client for R2
β”œβ”€β”€ env.schema.ts            # Env validation (Fastify)
β”œβ”€β”€ swagger.config.ts        # OpenAPI + Swagger UI config
β”œβ”€β”€ rate-limit.config.ts     # Rate limit config (Fastify)
β”œβ”€β”€ types.ts
β”œβ”€β”€ ecosystem.config.js      # PM2 config
β”œβ”€β”€ tsconfig.json
β”œβ”€β”€ package.json
β”‚
β”œβ”€β”€ vercel/                  # Vercel serverless implementation
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ upload/pdf-screenshot.ts    # handler (default export)
β”‚   β”‚   └── _lib/
β”‚   β”‚       β”œβ”€β”€ pdf-render.ts           # pdfjs-dist + canvas renderer
β”‚   β”‚       β”œβ”€β”€ r2-upload.ts            # S3 client wrapper
β”‚   β”‚       β”œβ”€β”€ auth.ts                 # bearer token check
β”‚   β”‚       β”œβ”€β”€ rate-limit.ts           # Upstash-backed limiter
β”‚   β”‚       └── schemas.ts              # zod schemas
β”‚   β”œβ”€β”€ scripts/             # smoke tests (renderer, handler)
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ vercel.json
β”‚   β”œβ”€β”€ tsconfig.json
β”‚   └── .env.example
β”‚
└── docs/
    └── serverless-migration.md         # detailed log of the move to Vercel

βš™οΈ Technical details

πŸ–ΌοΈ PDF rendering

  • pdfjs-dist 3.4.120 (legacy build) + canvas 3.2.0 (Cairo).
  • Renders page 1 at 1.5Γ— scale by default.
  • Output is a PNG buffer.

πŸ“¦ Storage

  • Cloudflare R2 via AWS S3 SDK (@aws-sdk/client-s3).
  • Object key shape: pdf_thumbnails/{userId}/thumb-{filename}.png.
  • Public URL returned in the response.

🩺 Troubleshooting

🐎 Fastify (Hetzner)

npm run pm2:status     # check process state
npm run pm2:logs       # tail live logs
npm run pm2:restart    # restart after deploy
npm run pm2:delete && npm run build && npm run pm2:start   # full reset

⚑ Vercel

vercel ls                                       # list recent deployments
vercel inspect --logs <deployment-url>          # build logs
vercel logs <deployment-url> --json             # runtime logs
vercel curl "<deployment-url>/api/..." -- ...   # auth-aware curl through deployment protection

πŸ“– For the full history of how the Vercel path was put together (what broke, what got pinned, why we ended up on Node 22 + canvas instead of other combos), see docs/serverless-migration.md.

About

This is a node server that takes in a pdf url, takes a screenshot of the first page and uploads it to DB

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors