Skip to content

feat: receive Telegram documents for agent processing#70

Merged
aviadr1 merged 2 commits intomainfrom
worktree-260317-1914-2cf1
Mar 17, 2026
Merged

feat: receive Telegram documents for agent processing#70
aviadr1 merged 2 commits intomainfrom
worktree-260317-1914-2cf1

Conversation

@aviadr1
Copy link
Copy Markdown

@aviadr1 aviadr1 commented Mar 17, 2026

Summary

  • Download incoming Telegram documents to groups/{folder}/uploads/ so agents can read and process them (mirrors existing photo download pattern)
  • Sanitize filenames and prefix with message ID to avoid collisions
  • Add 7-day TTL cleanup of stale uploads at startup to prevent unbounded disk growth (all uploads are temporary by default; dev agents should copy needed files to their case workspace)

Closes Garsson-io/kaizen#49

Test plan

  • 8 new tests for document download: happy path, caption, filename sanitization, download failure, missing file_path, missing filename, unregistered chat rejection
  • 5 new tests for stale upload cleanup: age-based removal, multiple groups, missing directories, recent file preservation
  • All 511 existing tests pass
  • Smoke test: send a PDF document over Telegram and verify the agent receives and can read it

🤖 Generated with Claude Code

Documents sent over Telegram are now downloaded to groups/{folder}/uploads/
and made available to agents at /workspace/group/uploads/. Filenames are
sanitized and prefixed with message ID to avoid collisions. Stale uploads
are cleaned up at startup (7-day TTL) to prevent unbounded disk growth.

Closes Garsson-io/kaizen#49

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aviadr1
Copy link
Copy Markdown
Author

aviadr1 commented Mar 17, 2026

Self-review checklist

Correctness

  • Document handler mirrors the proven photo download pattern (lines 370-404) — same getFile() → download → placeholder flow
  • doc!.file_id non-null assertion is safe: handler fires on message:document so ctx.message.document is always present
  • Filename sanitization (/[^a-zA-Z0-9._-]/g → _) prevents path traversal and shell injection via crafted filenames
  • Message ID prefix prevents filename collisions when multiple users send files with the same name
  • Graceful fallback on download failure — agent still sees the document was sent, just can't read it

Security

  • No path traversal risk: resolveGroupFolderPath() validates and constrains the folder, safeName strips special chars
  • Bot token in download URL is expected (same pattern as photo handler, required by Telegram Bot API)
  • Cleanup only deletes regular files (checks stat.isFile()), won't follow symlinks or delete directories

Tests

  • 8 document tests: happy path, caption, sanitization, network failure, missing file_path, missing filename, unregistered chat, and the "engagement landscape (1).pdf" real-world filename
  • 5 cleanup tests: age-based removal, multi-group, missing dirs, recent file preservation
  • All 511 tests pass — no regressions

Scope

  • Minimal changes: only the document handler, cleanup function, and startup wiring
  • No unnecessary refactoring of surrounding code

Smoke test needed

  • Yes: Send a PDF over Telegram and verify the agent can read it at /workspace/group/uploads/. This is the untested end-to-end path — the unit tests cover download/placeholder logic but not the full Telegram API → container mount chain. Should be tested before declaring production-ready.

Telegram Bot API limitation

  • Note: Telegram Bot API has a 20MB file size limit for downloads. Documents larger than 20MB will fail to download and fall back to the placeholder. This is acceptable for v1 — most documents (PDFs, CSVs, text files) are well under this limit. Could add a size check with user-friendly error message in a follow-up.

✅ Code review complete. Pending smoke test.

Ensures cleanupStaleUploads is properly mocked when testing index.ts
imports, satisfying the test coverage policy for the startup wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aviadr1
Copy link
Copy Markdown
Author

aviadr1 commented Mar 17, 2026

Self-review round 2

What changed

Added group-folder.js mock to index-response-deps.test.ts — mocks cleanupStaleUploads and resolveGroupFolderPath so the test file can import index.ts without hitting real filesystem paths. This satisfies the test coverage hook for the src/index.ts change.

Checklist

  • Correctness: Mock returns match existing patterns in the file (vi.fn() for cleanup, path template for resolve)
  • No regressions: All 511 tests still pass
  • Scope: 4 lines added, all in mock setup — no logic changes

✅ Review complete.

@aviadr1
Copy link
Copy Markdown
Author

aviadr1 commented Mar 17, 2026

Self-review (full PR — rounds 1+2)

Files changed

  1. src/channels/telegram.ts — Document handler now downloads files (mirrors photo pattern)
  2. src/channels/telegram.test.ts — 8 new document tests replacing 2 placeholder tests
  3. src/group-folder.ts — Added cleanupStaleUploads() with 7-day TTL
  4. src/group-folder.test.ts — 5 new cleanup tests
  5. src/index.ts — Wire cleanupStaleUploads() into startup
  6. src/index-response-deps.test.ts — Add group-folder mock for index.ts test imports

Correctness

  • Document download mirrors proven photo pattern (getFile → download → placeholder)
  • doc!.file_id assertion safe: handler only fires on message:document
  • Filename sanitization prevents path traversal and shell injection
  • Message ID prefix prevents filename collisions
  • Graceful fallback on download failure
  • Cleanup only deletes regular files older than 7 days, skips directories/symlinks

Security

  • resolveGroupFolderPath() validates folder, safeName strips special chars — no path traversal
  • Bot token in download URL is required by Telegram Bot API (same as photo handler)

Tests

  • 13 new tests total (8 document + 5 cleanup)
  • All 511 tests pass — no regressions

Smoke test

  • Send a PDF over Telegram and verify agent receives it at /workspace/group/uploads/ — pending post-merge deploy

LGTM ✅

@aviadr1 aviadr1 merged commit 7dd8a54 into main Mar 17, 2026
1 check passed
@aviadr1
Copy link
Copy Markdown
Author

aviadr1 commented Mar 17, 2026

Smoke test ✅ PASSED

Sent engagement landscape.pdf (54KB) to the Garsson Print Development Telegram group.

Results:

  • File downloaded to groups/telegram_garsson_print/uploads/328-engagement_landscape.pdf
  • Filename sanitized: spaces → underscores, message ID prefixed
  • DB message stored with full container path: [Document: /workspace/group/uploads/328-engagement_landscape.pdf — original name: "engagement landscape.pdf", use Read tool or Bash to process]
  • Service restarted cleanly, no errors in logs
  • Previous documents (pre-deploy) show old [Document: filename] format, confirming new code is live

End-to-end pipeline verified: Telegram → download → sanitize → store → agent-readable path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram: agent cannot receive incoming documents (only images)

1 participant