Skip to content

feat: add /add-telegram-voice skill for local whisper.cpp transcription#718

Closed
vweaver wants to merge 2 commits intoqwibitai:mainfrom
vweaver:skill/add-telegram-voice
Closed

feat: add /add-telegram-voice skill for local whisper.cpp transcription#718
vweaver wants to merge 2 commits intoqwibitai:mainfrom
vweaver:skill/add-telegram-voice

Conversation

@vweaver
Copy link
Copy Markdown

@vweaver vweaver commented Mar 5, 2026

Summary

Adds a new skill (/add-telegram-voice) that upgrades the Telegram channel with local voice message transcription using whisper.cpp. Follows the project's "skills over features" philosophy — no changes to core code.

  • Channel-agnostic transcription moduletranscribeAudio(Buffer): Promise<string | null>, usable by any channel
  • Local whisper.cpp — no cloud API, no API key, no cost. Uses ffmpeg + whisper-cli on-device
  • Depends on telegram skill, conflicts with voice-transcription (incompatible src/transcription.ts API — Baileys-coupled vs channel-agnostic)
  • Voice notes arrive as [Voice: <transcript>] instead of [Voice message] placeholders

Skill contents

File Purpose
SKILL.md Setup: install deps, download model, apply, verify
manifest.yaml Metadata, deps, conflicts
add/src/transcription.ts Channel-agnostic whisper.cpp module
modify/src/channels/telegram.ts Async voice handler with download + transcribe
modify/src/channels/telegram.test.ts 5 new voice tests + fetch/transcription mocks
tests/telegram-voice.test.ts 15 skill package validation tests

Built and tested on Linux (Ubuntu) with whisper.cpp built from source (-DBUILD_SHARED_LIBS=OFF). SKILL.md includes Linux-specific build instructions and the static linking gotcha.

Test plan

  • 15 skill package tests pass (manifest, file presence, intent files, API shape, no Baileys deps)
  • 54 Telegram channel tests pass (including 5 new voice transcription tests)
  • Full suite passes (374 tests)
  • Verified end-to-end: voice note in Telegram -> [Voice: Can you hear me now?]

🤖 Generated with Claude Code

Adds a skill that upgrades the Telegram channel with local voice message
transcription using whisper.cpp. Voice notes arrive as [Voice: <transcript>]
instead of placeholders. No cloud API, no cost — runs entirely on-device.

- Channel-agnostic transcription module (Buffer in, text out)
- Depends on telegram skill, conflicts with voice-transcription (different API)
- 15 skill package tests, 5 new voice transcription integration tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@TomGranot
Copy link
Copy Markdown
Collaborator

There's already a use-local-whisper skill that landed in #702. How does this differ? If it's the same thing, we should close this one.

@Andy-NanoClaw-AI Andy-NanoClaw-AI added PR: Skill Skill package or skill-related changes Status: Needs Review Ready for maintainer review labels Mar 5, 2026
@vweaver
Copy link
Copy Markdown
Author

vweaver commented Mar 5, 2026

They solve different problems — use-local-whisper swaps the OpenAI backend for whisper.cpp on WhatsApp, while this PR adds voice transcription to Telegram which has no voice support today.

That said, use-local-whisper already notes that Telegram just needs audio download logic added. Rather than a separate skill, I could rework this as an update to use-local-whisper that:

  1. Adds a channel-agnostic transcribeAudio(Buffer) export to src/transcription.ts
  2. Keeps the existing transcribeAudioMessage(WAMessage, WASocket) as a wrapper so WhatsApp isn't broken
  3. Adds the Telegram voice handler and tests

Would you prefer that approach, or is there a different way you'd like to see it structured?

@vweaver
Copy link
Copy Markdown
Author

vweaver commented Mar 5, 2026

Closing in favor of a new PR that adds Telegram support to the existing use-local-whisper skill instead of creating a separate skill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Skill Skill package or skill-related changes Status: Needs Review Ready for maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants