feat(skills): update voice-transcription for OpenAI-compatible endpoints by glifocat · Pull Request #4 · glifocat/nanoclaw-glifocat

glifocat · 2026-03-07T18:40:49Z

Type of Change

Skill - adds a new skill in .claude/skills/
Fix - bug fix or security fix to source code
Simplification - reduces or simplifies source code

Description

Updates the add-voice-transcription skill to support any OpenAI-compatible /v1/audio/transcriptions endpoint instead of hardcoding OpenAI's Whisper API.

Changes:

transcription.ts — configurable TRANSCRIPTION_BASE_URL, TRANSCRIPTION_MODEL, TRANSCRIPTION_API_KEY (optional, defaults to not-needed). Adds convertOggToWav() via ffmpeg for endpoint compatibility (WhatsApp sends ogg/opus which many ASR servers cannot decode directly).
manifest.yaml — updated env_additions and description
SKILL.md — rewritten with examples for Parakeet, Whisper (Speaches), and OpenAI direct
tests/ — assertions updated for new env vars and function names

Channel support: WhatsApp only (same as before). The transcription module uses Baileys types for audio download.

Tested live with NVIDIA Parakeet TDT 0.6B v3 (~2s latency, good Spanish support).

For Skills

I have not made any changes to source code
My skill contains instructions for Claude to follow (not pre-built code)
I tested this skill on a fresh clone

Test plan

Skill package tests pass (8/8)
Clean clone e2e: apply whatsapp + voice-transcription → build + 41 tests pass
Live test with Parakeet TDT 0.6B v3 on ai-inference VM
git diff --stat origin/main — only .claude/skills/add-voice-transcription/ paths

🤖 Generated with Claude Code

Additive change — existing code paths still run via inline fallbacks. Prepares core for per-module extractions in PR #3 onward. Four registries added with empty defaults: - delivery action handlers (delivery.ts) - router inbound gate (router.ts) - response dispatcher (index.ts) - MCP tool self-registration (container/agent-runner/src/mcp-tools/server.ts) Default modules moved to src/modules/ for signaling: - src/modules/typing/ (extracted from delivery.ts) - src/modules/mount-security/ (moved from src/mount-security.ts) Both are imported directly by core — no hook, no registry. Removal requires editing core imports. Migrator now keys applied rows by name (uniqueness) so module migrations can pick arbitrary version numbers. Stored version column is auto-assigned as an applied-order sequence. sqlite_master guards added around core calls into module-owned tables (user_roles, agent_destinations, pending_questions). No-ops today; load-bearing after the owning modules are extracted. MODULE-HOOK markers placed at scheduling's two skill-edit sites (host-sweep.ts recurrence call, poll-loop.ts pre-task gate). PR #4 replaces the marked blocks when scheduling moves to its module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

refactor: extract scheduling as registry-based module (PR #4)

glifocat merged this pull request into main Mar 7, 2026
5 checks passed

glifocat pushed a commit that referenced this pull request Apr 23, 2026

Merge pull request qwibitai#1842 from qwibitai/refactor/pr4-scheduling

e75af5e

refactor: extract scheduling as registry-based module (PR #4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): update voice-transcription for OpenAI-compatible endpoints#4

feat(skills): update voice-transcription for OpenAI-compatible endpoints#4
glifocat merged 0 commit intomainfrom
upstream/skill-voice-transcription-openai-compatible

glifocat commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glifocat commented Mar 7, 2026

Type of Change

Description

For Skills

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant