Skip to content

feat(skills): update voice-transcription for OpenAI-compatible endpoints#4

Merged
glifocat merged 0 commit intomainfrom
upstream/skill-voice-transcription-openai-compatible
Mar 7, 2026
Merged

feat(skills): update voice-transcription for OpenAI-compatible endpoints#4
glifocat merged 0 commit intomainfrom
upstream/skill-voice-transcription-openai-compatible

Conversation

@glifocat
Copy link
Copy Markdown
Owner

@glifocat glifocat commented Mar 7, 2026

Type of Change

  • Skill - adds a new skill in .claude/skills/
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code

Description

Updates the add-voice-transcription skill to support any OpenAI-compatible /v1/audio/transcriptions endpoint instead of hardcoding OpenAI's Whisper API.

Changes:

  • transcription.ts — configurable TRANSCRIPTION_BASE_URL, TRANSCRIPTION_MODEL, TRANSCRIPTION_API_KEY (optional, defaults to not-needed). Adds convertOggToWav() via ffmpeg for endpoint compatibility (WhatsApp sends ogg/opus which many ASR servers cannot decode directly).
  • manifest.yaml — updated env_additions and description
  • SKILL.md — rewritten with examples for Parakeet, Whisper (Speaches), and OpenAI direct
  • tests/ — assertions updated for new env vars and function names

Channel support: WhatsApp only (same as before). The transcription module uses Baileys types for audio download.

Tested live with NVIDIA Parakeet TDT 0.6B v3 (~2s latency, good Spanish support).

For Skills

  • I have not made any changes to source code
  • My skill contains instructions for Claude to follow (not pre-built code)
  • I tested this skill on a fresh clone

Test plan

  • Skill package tests pass (8/8)
  • Clean clone e2e: apply whatsapp + voice-transcription → build + 41 tests pass
  • Live test with Parakeet TDT 0.6B v3 on ai-inference VM
  • git diff --stat origin/main — only .claude/skills/add-voice-transcription/ paths

🤖 Generated with Claude Code

@glifocat glifocat merged this pull request into main Mar 7, 2026
5 checks passed
glifocat pushed a commit that referenced this pull request Apr 23, 2026
Additive change — existing code paths still run via inline fallbacks.
Prepares core for per-module extractions in PR #3 onward.

Four registries added with empty defaults:
  - delivery action handlers (delivery.ts)
  - router inbound gate (router.ts)
  - response dispatcher (index.ts)
  - MCP tool self-registration (container/agent-runner/src/mcp-tools/server.ts)

Default modules moved to src/modules/ for signaling:
  - src/modules/typing/       (extracted from delivery.ts)
  - src/modules/mount-security/ (moved from src/mount-security.ts)

Both are imported directly by core — no hook, no registry. Removal
requires editing core imports.

Migrator now keys applied rows by name (uniqueness) so module
migrations can pick arbitrary version numbers. Stored version column
is auto-assigned as an applied-order sequence.

sqlite_master guards added around core calls into module-owned tables
(user_roles, agent_destinations, pending_questions). No-ops today;
load-bearing after the owning modules are extracted.

MODULE-HOOK markers placed at scheduling's two skill-edit sites
(host-sweep.ts recurrence call, poll-loop.ts pre-task gate). PR #4
replaces the marked blocks when scheduling moves to its module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
glifocat pushed a commit that referenced this pull request Apr 23, 2026
refactor: extract scheduling as registry-based module (PR #4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant