Skip to content

feat(bots): expand AI_BOT_PATTERN to cover 26 more crawlers#12

Merged
Gdewilde merged 1 commit intomainfrom
feat/expand-bot-coverage
Apr 28, 2026
Merged

feat(bots): expand AI_BOT_PATTERN to cover 26 more crawlers#12
Gdewilde merged 1 commit intomainfrom
feat/expand-bot-coverage

Conversation

@Gdewilde
Copy link
Copy Markdown
Contributor

Summary

  • Expand `AI_BOT_PATTERN` from 23 → 42 tokens, covering Claude-SearchBot/Claude-Web, plain Applebot, Google-CloudVertexBot / Google-Agent / GoogleAgent-Mariner / Gemini-Deep-Research, Amzn-SearchBot / NovaAct, AzureAI-SearchBot, meta-externalfetcher / meta-webindexer, DeepSeek, PanguBot, Webzio-Extended / omgili(bot), Timpibot, all Grok / xAI variants, Manus-User, quillbot, MyCentralAIScraperBot, cohere-training-data-crawler, and Ai2Bot-Dolma.
  • Fix the plain `Applebot` gap (regex now uses `Applebot`, still substring-matches `Applebot-Extended`).
  • Add `parseBotName` labels for the new vendors (DeepSeek, Huawei, Webz.io, Timpi, xAI, Manus, QuillBot, Microsoft, MyCentralAI, plus new Claude / Google / Amazon / Meta variants).
  • `Claude-Code` is intentionally still excluded from `AI_BOT_PATTERN` (per existing doc comment) — it's a coding-agent UA handled by the HTTP-library heuristic.

Brings coverage to 47/48 of the bots tracked by Peec AI's reference list.

Test plan

  • `npx vitest run` — 161/161 passing (added 27 new `isAiBot` cases + 25 `parseBotName` cases).
  • `npx tsc --noEmit` clean.
  • Spot-check a real-world UA log after release to confirm no regressions in browser/HTTP-client classification.

🤖 Generated with Claude Code

Adds detection for Claude-SearchBot/Claude-Web, plain Applebot,
Google-CloudVertexBot/Google-Agent/GoogleAgent-Mariner/Gemini-Deep-Research,
Amzn-SearchBot/NovaAct, AzureAI-SearchBot, meta-externalfetcher/meta-webindexer,
DeepSeek, PanguBot, Webzio-Extended/omgili(bot), Timpibot, Grok variants,
Manus-User, quillbot, MyCentralAIScraperBot, cohere-training-data-crawler,
and Ai2Bot-Dolma. Also adds parseBotName labels for new vendors. Claude-Code
remains intentionally excluded — it's a coding-agent UA handled by the
HTTP-library heuristic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Gdewilde Gdewilde merged commit 9a611c7 into main Apr 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant