feat(voice): audio transcription for voice messages with Local Whisper support by ekzhu · Pull Request #1476 · agentscope-ai/QwenPaw

ekzhu · 2026-03-14T06:16:51Z

Description

Voice messages from channels (Discord, Telegram, DingTalk, etc.) are .ogg files, which most LLM backends cannot process directly. This PR adds audio transcription support so voice messages work with all model backends, along with a configurable audio handling mode and a Console UI settings page.

Key changes:

Audio transcription via OpenAI-compatible /v1/audio/transcriptions (Whisper API) or locally installed openai-whisper library
Two audio modes: auto (transcribe if provider available, else show file placeholder) and native (send audio directly to model)
Transcription provider selection: Disabled, Whisper API (remote) or Local Whisper (requires ffmpeg + openai-whisper)
openai-whisper available as optional dependency via copaw[whisper]
Console UI settings page for Voice Transcription with provider status checks
CLI copaw init prompts for audio mode and transcription provider
ffmpeg-based ogg→wav conversion for native audio mode

Related Issue: Fixes Discord voice message support

Security Considerations: N/A

Type of Change

Component(s) Affected

Checklist

I ran pre-commit run --all-files locally and it passes
If pre-commit auto-fixed files, I committed those changes and reran checks
I ran tests locally (pytest or as relevant) and they pass
Documentation updated (if needed)
Ready for review

Testing

Send a voice message (.ogg) from Discord/Telegram with an OpenAI provider configured → voice is transcribed to text via Whisper API
Send a voice message with no transcription provider configured, or disabled → file-uploaded placeholder is shown to model
Set audio_mode: "native" with ffmpeg installed → audio is converted to wav and sent natively to model
Set transcription provider type to Local Whisper with ffmpeg + openai-whisper installed → transcription runs locally
Open Console UI → Settings → Voice Transcription → verify audio mode toggle, provider type selection, provider picker, and Local Whisper status all work
Run copaw init → verify audio mode and transcription provider prompts appear
Existing text and image messages continue to work normally

Local Verification Evidence

pre-commit run --all-files   # All passed
npm run format               # All formatted

Discord voice messages are .ogg files, but agentscope's OpenAIChatFormatter only accepts wav and mp3 extensions, causing "Unsupported audio file extension" errors. This fix: - Adds .ogg, .flac, .m4a, .aac to _media_type_from_path mapping - Adds _convert_audio_to_wav() using ffmpeg to convert unsupported audio formats before sending to the formatter - Gracefully falls back to original file if ffmpeg is unavailable https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

Discord/Telegram voice messages (.ogg) fail because the agentscope OpenAIChatFormatter only accepts wav/mp3, and most models (Ollama/Qwen) can't process audio at all. Add a transcription layer using the OpenAI-compatible /v1/audio/transcriptions endpoint (which accepts ogg natively): - New audio_transcription.py: finds an OpenAI-compatible provider and transcribes audio via whisper-1 - New audio_mode config ("auto"|"transcribe"|"native"): - auto (default): try transcription, fall back to native audio - transcribe: always convert audio to text - native: send audio blocks directly (needs ffmpeg for ogg→wav) - Refactor message_processing.py to route audio blocks through the appropriate path based on config https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

Add user-facing configuration for audio_mode (auto/transcribe/native): - API: GET/PUT /agent/audio-mode endpoints - CLI: audio mode prompt in `copaw init` - Console: new Voice Transcription settings page under Settings - i18n: English and Chinese translations https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

gemini-code-assist · 2026-03-14T06:17:17Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's ability to process incoming audio messages, particularly addressing Discord's .ogg format. It introduces a flexible audio handling mechanism that allows users to choose between automatic transcription, forced transcription, or native audio processing with optional format conversion. This ensures broader compatibility with different AI models and improves the overall user experience for voice interactions.

Highlights

Audio Transcription via OpenAI-compatible API: Implemented audio transcription using OpenAI-compatible /v1/audio/transcriptions (Whisper) to enable voice messages to work with all model backends, addressing the issue of .ogg files from Discord.
Configurable Audio Mode: Introduced a new configurable audio_mode setting (auto/transcribe/native) which is exposed through the configuration, CLI, API, and a new Console UI page.
FFmpeg-based OGG to WAV Conversion: Added fallback FFmpeg-based .ogg to .wav conversion for models that natively support audio, providing a robust solution for various audio handling scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

console/src/api/modules/agent.ts
- Added getAudioMode and updateAudioMode API functions for managing audio processing settings.
console/src/layouts/MainLayout/index.tsx
- Imported VoiceTranscriptionPage and added a new route for it under settings.
console/src/layouts/Sidebar.tsx
- Added a new 'Voice Transcription' entry to the sidebar navigation with a microphone icon.
console/src/locales/en.json
- Added new localization keys and values for the 'Voice Transcription' settings page in English.
console/src/locales/zh.json
- Added new localization keys and values for the 'Voice Transcription' settings page in Chinese.
console/src/pages/Settings/VoiceTranscription/index.module.less
- Added new CSS styles for the Voice Transcription settings page layout and components.
console/src/pages/Settings/VoiceTranscription/index.tsx
- Added a new React component for the Voice Transcription settings page, allowing users to configure audio mode.
src/copaw/agents/utils/audio_transcription.py
- Added a new utility module for transcribing audio files using OpenAI-compatible API endpoints, including provider discovery logic.
src/copaw/agents/utils/message_processing.py
- Expanded supported audio media types to include .ogg, .flac, .m4a, and .aac.
- Added a new function _convert_audio_to_wav to convert audio files to WAV format using FFmpeg if not natively supported.
- Introduced _process_audio_block to handle audio blocks based on the configured audio_mode, integrating transcription and conversion logic.
- Modified _process_single_block to dispatch audio blocks to the new _process_audio_block function.
src/copaw/app/routers/agent.py
- Added new GET and PUT API endpoints (/audio-mode) to retrieve and update the agent's audio handling mode.
src/copaw/cli/init_cmd.py
- Added an interactive prompt during CLI initialization to allow users to select their preferred audio mode.
src/copaw/config/config.py
- Added an audio_mode field to the AgentsConfig class with a default value of 'auto' and a description of its purpose.

Activity

The author has indicated that pre-commit run --all-files and pytest are pending, suggesting local verification is still in progress.
No other review comments or activity have been recorded yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

gemini-code-assist

Code Review

This pull request introduces a significant feature for handling audio messages, particularly .ogg files from Discord, by adding transcription capabilities. It adds a new audio_mode configuration to control whether audio is transcribed, sent natively, or handled automatically. The changes span the backend, CLI, and the console UI, including a new settings page.

The implementation is well-structured. I've provided a few suggestions to improve maintainability and debuggability:

Making the transcription model name configurable instead of hardcoding it.
Improving the extensibility of finding transcription providers.
Enhancing error logging for ffmpeg conversion failures and for API calls in the frontend.

Overall, this is a great addition that significantly improves the agent's ability to handle multimedia messages.

Copilot

Pull request overview

Adds configurable handling for incoming voice/audio messages (notably Discord .ogg) by introducing auto transcription via an OpenAI-compatible Whisper endpoint, plus UI/CLI/API surfaces to configure behavior.

Changes:

Introduces agents.audio_mode (auto / transcribe / native) and exposes it via CLI init flow and new agent API endpoints.
Updates message media processing to support more audio MIME types, attempt transcription, and optionally convert audio to .wav via ffmpeg for native-audio forwarding.
Adds a Console UI settings page (plus navigation + i18n) for configuring voice transcription/audio handling.

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/copaw/config/config.py	Adds `agents.audio_mode` config field.
src/copaw/cli/init_cmd.py	Prompts for `audio_mode` during interactive init.
src/copaw/app/routers/agent.py	Adds GET/PUT `/agent/audio-mode` endpoints.
src/copaw/agents/utils/message_processing.py	Adds audio transcription flow and ffmpeg `.ogg`→`.wav` conversion fallback; expands audio MIME mapping.
src/copaw/agents/utils/audio_transcription.py	New utility to call OpenAI-compatible `/v1/audio/transcriptions`.
console/src/pages/Settings/VoiceTranscription/index.tsx	New settings page to view/update audio mode.
console/src/pages/Settings/VoiceTranscription/index.module.less	Styles for the new settings page.
console/src/locales/en.json	Adds nav label + page strings for Voice Transcription.
console/src/locales/zh.json	Adds nav label + page strings for Voice Transcription.
console/src/layouts/Sidebar.tsx	Adds sidebar entry + route mapping for Voice Transcription.
console/src/layouts/MainLayout/index.tsx	Adds route + selection mapping for Voice Transcription page.
console/src/api/modules/agent.ts	Adds `getAudioMode` / `updateAudioMode` API calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

- Add console.error() logging in frontend catch blocks for debuggability - Add Japanese (ja.json) and Russian (ru.json) translations for voiceTranscription nav key and settings page strings - Include ffmpeg stderr output in audio conversion error logs https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

Copilot

Pull request overview

Adds configurable handling for incoming voice/audio messages so Discord .ogg voice notes can work across model backends by transcribing to text via an OpenAI-compatible Whisper endpoint, with an optional ffmpeg conversion path for native-audio models.

Changes:

Introduces agents.audio_mode (auto/transcribe/native) surfaced via config and copaw init.
Adds /agent/audio-mode GET/PUT endpoints and a Console settings page to view/update the setting.
Extends message media processing to (a) transcribe audio to text and (b) optionally convert unsupported audio formats to .wav with ffmpeg.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/copaw/config/config.py	Adds `agents.audio_mode` config field.
src/copaw/cli/init_cmd.py	Prompts for audio mode during interactive init.
src/copaw/app/routers/agent.py	Adds API endpoints to get/set audio mode.
src/copaw/agents/utils/message_processing.py	Implements audio transcription + ffmpeg conversion behavior for audio blocks.
src/copaw/agents/utils/audio_transcription.py	New Whisper transcription helper using OpenAI-compatible `/v1/audio/transcriptions`.
console/src/pages/Settings/VoiceTranscription/index.tsx	New Console UI page for selecting audio mode.
console/src/pages/Settings/VoiceTranscription/index.module.less	Styles for the new settings page.
console/src/api/modules/agent.ts	Adds `getAudioMode` / `updateAudioMode` client calls.
console/src/layouts/Sidebar.tsx	Adds nav entry/route mapping for Voice Transcription.
console/src/layouts/MainLayout/index.tsx	Wires the new route to the new page.
console/src/locales/en.json	Adds nav + page strings.
console/src/locales/zh.json	Adds nav + page strings.
console/src/locales/ru.json	Adds nav + page strings.
console/src/locales/ja.json	Adds nav + page strings.
console/src/pages/Settings/Models/components/modals/ProviderConfigModal.tsx	Formatting-only change.
console/src/pages/Settings/Models/components/cards/RemoteProviderCard.tsx	Formatting-only change.
console/src/pages/Control/Sessions/index.tsx	Formatting-only change.
console/src/components/MarkdownCopy/MarkdownCopy.tsx	Formatting-only change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Wrap _convert_audio_to_wav calls with asyncio.to_thread so the blocking subprocess doesn't stall the event loop (up to 30s timeout) - Change AgentsConfig.audio_mode from str to Literal["auto", "transcribe", "native"] for load-time validation https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

…ges-NVPD1

Copilot

Pull request overview

Adds configurable handling for incoming voice/audio messages (notably Discord .ogg) by introducing transcription support and exposing an audio_mode setting across backend, CLI, API, and Console UI.

Changes:

Add audio_mode config (auto/transcribe/native) and expose it via CLI init prompts and new FastAPI endpoints.
Implement audio block processing that can transcribe via an OpenAI-compatible Whisper endpoint and optionally convert audio via ffmpeg.
Add Console UI settings page + i18n strings for “Voice Transcription”.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/copaw/config/config.py	Adds `AgentsConfig.audio_mode` setting.
src/copaw/cli/init_cmd.py	Prompts for `audio_mode` during `copaw init` interactive flow.
src/copaw/app/routers/agent.py	Adds GET/PUT `/agent/audio-mode` endpoints.
src/copaw/agents/utils/message_processing.py	Adds transcription/conversion pipeline for audio blocks and expands supported audio media types.
src/copaw/agents/utils/audio_transcription.py	New utility to transcribe audio via OpenAI-compatible `/audio/transcriptions`.
console/src/pages/Settings/VoiceTranscription/index.tsx	New settings page to view/edit audio mode.
console/src/pages/Settings/VoiceTranscription/index.module.less	Styling for the new settings page.
console/src/pages/Settings/Models/components/modals/ProviderConfigModal.tsx	Formatting-only change.
console/src/pages/Settings/Models/components/cards/RemoteProviderCard.tsx	Formatting-only change.
console/src/pages/Control/Sessions/index.tsx	Formatting-only change.
console/src/locales/en.json	Adds nav + page strings for Voice Transcription.
console/src/locales/zh.json	Adds nav + page strings for Voice Transcription.
console/src/locales/ru.json	Adds nav + page strings for Voice Transcription.
console/src/locales/ja.json	Adds nav + page strings for Voice Transcription.
console/src/layouts/Sidebar.tsx	Adds sidebar entry/route mapping for Voice Transcription.
console/src/layouts/MainLayout/index.tsx	Adds route for the Voice Transcription page.
console/src/components/MarkdownCopy/MarkdownCopy.tsx	Formatting-only change.
console/src/api/modules/agent.ts	Adds `getAudioMode` / `updateAudioMode` API calls.

Comments suppressed due to low confidence (1)

src/copaw/agents/utils/message_processing.py:343

For audio blocks that get transcribed/replaced with a text block (or replaced with the “(transcription unavailable)” placeholder), _process_single_block still returns local_path. This later triggers process_file_and_media_blocks_in_message to insert a “User uploaded a file, downloaded to …” text block, which is noisy and can leak local filesystem paths even though the model no longer needs the audio file. Consider returning None (or a separate flag) when the audio block is converted to text, and only adding entries to downloaded_files for blocks that remain as downloadable media.

                "Updated %s block with local path: %s",
                block_type,
                local_path,
            )
            return local_path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove formatting-only diffs in files not related to voice transcription to keep the PR focused. https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

xieyxclack · 2026-03-16T09:33:30Z

Minor suggestion: For native mode, consider adding a dependency-status check for ffmpeg similar to what local_whisper mode already has, so users can see whether ffmpeg is installed before selecting this mode.

Show an ffmpeg installation status alert when native audio mode is selected, similar to the existing dependency check for Local Whisper. Helps users verify ffmpeg is available before selecting this mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds end-to-end voice message handling by introducing configurable audio handling modes and transcription backends (remote Whisper API or local openai-whisper), plus Console UI and API endpoints to configure and monitor the feature.

Changes:

Add agent config + CLI init prompts for audio mode and transcription provider type.
Implement audio block handling in message processing (transcribe in auto, send audio in native with ffmpeg conversion fallback).
Add backend API endpoints and Console settings page (incl. Local Whisper dependency checks) for voice transcription configuration.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/copaw/config/config.py	Adds `audio_mode` + transcription-related config fields.
src/copaw/cli/init_cmd.py	Adds interactive prompts for audio mode and transcription provider type.
src/copaw/app/routers/agent.py	Adds REST endpoints for audio/transcription settings and status checks.
src/copaw/app/channels/dingtalk/content_utils.py	Emits `AudioContent` for DingTalk voice messages.
src/copaw/app/channels/dingtalk/constants.py	Maps DingTalk `voice` type to `audio`.
src/copaw/app/channels/base.py	Lets audio-only messages bypass no-text debounce.
src/copaw/agents/utils/message_processing.py	Implements transcription/native-audio handling and ffmpeg conversion for audio blocks.
src/copaw/agents/utils/audio_transcription.py	New utility implementing Whisper API + Local Whisper transcription flows.
pyproject.toml	Adds optional dependency extra `copaw[whisper]`.
console/src/pages/Settings/VoiceTranscription/index.tsx	New Console settings page to configure voice transcription and check provider status.
console/src/pages/Settings/VoiceTranscription/index.module.less	Styling for the new settings page.
console/src/layouts/Sidebar.tsx	Adds navigation entry for Voice Transcription settings.
console/src/layouts/MainLayout/index.tsx	Adds route for `/voice-transcription`.
console/src/api/modules/agent.ts	Adds API client methods for new backend endpoints.
console/src/locales/en.json	Adds nav label + full Voice Transcription translations.
console/src/locales/zh.json	Adds nav label + full Voice Transcription translations.
console/src/locales/ru.json	Adds nav label + full Voice Transcription translations.
console/src/locales/ja.json	Adds nav label + full Voice Transcription translations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ekzhu · 2026-03-17T01:01:15Z

Minor suggestion: For native mode, consider adding a dependency-status check for ffmpeg similar to what local_whisper mode already has, so users can see whether ffmpeg is installed before selecting this mode.

Done.

Ensure OpenAI-compatible provider base URLs end with /v1 before passing to AsyncOpenAI. Fixes transcription failures for providers configured without the /v1 suffix (e.g. DeepSeek). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds end-to-end voice message support by introducing configurable audio handling (transcribe vs native audio), a transcription backend abstraction (remote Whisper API or local openai-whisper), backend APIs to manage these settings, and a Console UI settings page to configure and validate the setup.

Changes:

Add agent configuration for audio handling mode and transcription backend selection.
Implement audio block processing: secure local-path handling, optional Whisper transcription, and ffmpeg-based conversion for native audio.
Add Console settings page + backend API endpoints to manage and inspect voice transcription configuration/status.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/copaw/config/config.py	Adds config schema fields for `audio_mode` and transcription settings.
src/copaw/cli/init_cmd.py	Adds `copaw init` prompts for audio mode and transcription provider type.
src/copaw/app/routers/agent.py	Adds REST endpoints to get/set audio mode, transcription provider type/provider ID, and local whisper dependency status.
src/copaw/app/channels/dingtalk/content_utils.py	Switches DingTalk voice to an `AudioContent` block (vs file).
src/copaw/app/channels/dingtalk/constants.py	Maps DingTalk `voice` messages to the `audio` content type.
src/copaw/app/channels/base.py	Bypasses no-text debounce buffering for messages containing audio blocks.
src/copaw/agents/utils/message_processing.py	Implements audio-mode aware audio processing (transcribe vs native + conversion) and expands media allowlist roots.
src/copaw/agents/utils/audio_transcription.py	New transcription utility supporting Whisper API and local `openai-whisper`.
pyproject.toml	Adds optional `whisper` extra and includes it in `full`.
console/src/pages/Settings/VoiceTranscription/index.tsx	New Console settings page for voice transcription configuration and status.
console/src/pages/Settings/VoiceTranscription/index.module.less	Styling for the new settings page.
console/src/locales/{en,zh,ru,ja}.json	Adds UI strings for Voice Transcription page + nav entry.
console/src/layouts/Sidebar.tsx	Adds navigation entry for Voice Transcription.
console/src/layouts/MainLayout/index.tsx	Adds route for the Voice Transcription settings page.
console/src/api/modules/agent.ts	Adds client functions for the new backend endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Adds configurable voice-message handling so audio attachments (e.g., Discord/Telegram .ogg) can work across model backends by either transcribing to text (remote Whisper API or local openai-whisper) or sending audio natively (with ffmpeg conversion), plus Console UI and API/CLI configuration surfaces.

Changes:

Introduces audio_mode and transcription provider configuration in backend config + CLI init prompts.
Adds backend audio block processing (transcription and ffmpeg conversion) and a new audio_transcription utility.
Adds Console Settings page + navigation + API bindings for configuring voice transcription.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/copaw/config/config.py	Adds new agent config fields for audio mode + transcription provider selection.
src/copaw/cli/init_cmd.py	Prompts for audio mode and transcription provider during `copaw init`.
src/copaw/app/routers/agent.py	Adds REST endpoints for reading/updating audio/transcription settings and status.
src/copaw/app/channels/dingtalk/content_utils.py	Switches DingTalk voice payloads to runtime `AudioContent`.
src/copaw/app/channels/dingtalk/constants.py	Maps DingTalk `voice` message type to `audio`.
src/copaw/app/channels/base.py	Bypasses no-text debounce for audio-only messages so voice messages are processed immediately.
src/copaw/agents/utils/message_processing.py	Adds audio-specific processing: transcription in auto mode and ffmpeg conversion + native send in native mode.
src/copaw/agents/utils/audio_transcription.py	New module providing Whisper API and local-whisper transcription backends + provider listing/status helpers.
pyproject.toml	Adds `copaw[whisper]` extra and includes it in `copaw[full]`.
console/src/pages/Settings/VoiceTranscription/index.tsx	New Console settings page for audio mode and transcription provider configuration.
console/src/pages/Settings/VoiceTranscription/index.module.less	Styles for the new settings page.
console/src/locales/en.json	Adds navigation label and page strings for Voice Transcription settings.
console/src/locales/zh.json	Adds navigation label and page strings for Voice Transcription settings.
console/src/locales/ru.json	Adds navigation label and page strings for Voice Transcription settings.
console/src/locales/ja.json	Adds navigation label and page strings for Voice Transcription settings.
console/src/layouts/Sidebar.tsx	Adds Settings nav entry for Voice Transcription.
console/src/layouts/MainLayout/index.tsx	Adds `/voice-transcription` route to render the new page.
console/src/api/modules/agent.ts	Adds frontend API wrappers for the new backend endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        return AudioContent(
+            type=ContentType.AUDIO,
+            data=url,
+            format="amr",


xieyxclack

LGTM

…r support (agentscope-ai#1476)

claude added 3 commits March 14, 2026 04:43

Copilot AI review requested due to automatic review settings March 14, 2026 06:16

ekzhu had a problem deploying to maintainer-approved March 14, 2026 06:16 — with GitHub Actions Failure

Copilot started reviewing on behalf of ekzhu March 14, 2026 06:17 View session

style: run prettier on console directory

cb32811

https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

ekzhu had a problem deploying to maintainer-approved March 14, 2026 06:19 — with GitHub Actions Failure

gemini-code-assist Bot reviewed Mar 14, 2026

View reviewed changes

Copilot AI reviewed Mar 14, 2026

View reviewed changes

style: apply black formatting to message_processing.py

4cfba0b

https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

ekzhu had a problem deploying to maintainer-approved March 14, 2026 06:35 — with GitHub Actions Failure

Copilot AI review requested due to automatic review settings March 14, 2026 09:46

ekzhu had a problem deploying to maintainer-approved March 14, 2026 09:46 — with GitHub Actions Failure

Copilot started reviewing on behalf of ekzhu March 14, 2026 09:46 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread src/copaw/agents/utils/message_processing.py Outdated

Comment thread src/copaw/agents/utils/message_processing.py Outdated

Comment thread src/copaw/config/config.py Outdated

ekzhu had a problem deploying to maintainer-approved March 14, 2026 09:57 — with GitHub Actions Failure

github-actions Bot mentioned this pull request Mar 15, 2026

🦞 OpenClaw 生态日报 2026-03-15 gsscsd/big_model_radar#38

Open

Merge branch 'agentscope-ai:main' into claude/fix-discord-voice-messa…

6cb7a6a

…ges-NVPD1

Copilot AI review requested due to automatic review settings March 15, 2026 08:00

ekzhu had a problem deploying to maintainer-approved March 15, 2026 08:00 — with GitHub Actions Failure

Copilot started reviewing on behalf of ekzhu March 15, 2026 08:01 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

Comment thread src/copaw/agents/utils/message_processing.py Outdated

Comment thread src/copaw/app/routers/agent.py Outdated

Comment thread src/copaw/agents/utils/audio_transcription.py Outdated

chore: revert unrelated prettier spacing changes

e19b9b1

Remove formatting-only diffs in files not related to voice transcription to keep the PR focused. https://claude.ai/code/session_01HG6R9iZT7aGtYpQvkk1Dtb

ekzhu had a problem deploying to maintainer-approved March 15, 2026 08:07 — with GitHub Actions Failure

ekzhu changed the title ~~Fix Discord voice messages (.ogg) with auto-transcription~~ (feat) Discord voice messages (.ogg) with auto-transcription Mar 15, 2026

ekzhu changed the title ~~(feat) Discord voice messages (.ogg) with auto-transcription~~ feat (channel) Discord voice messages (.ogg) with auto-transcription Mar 15, 2026

ekzhu requested review from rayrayraykk and xieyxclack March 16, 2026 09:21

Merge branch 'main' into claude/fix-discord-voice-messages-NVPD1

345b5ed

Copilot AI review requested due to automatic review settings March 17, 2026 00:48

ekzhu had a problem deploying to maintainer-approved March 17, 2026 00:48 — with GitHub Actions Failure

Copilot started reviewing on behalf of ekzhu March 17, 2026 00:49 View session

ekzhu had a problem deploying to maintainer-approved March 17, 2026 00:52 — with GitHub Actions Failure

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread src/copaw/agents/utils/message_processing.py

Comment thread src/copaw/agents/utils/message_processing.py

Comment thread src/copaw/agents/utils/audio_transcription.py

format

c789c13

ekzhu had a problem deploying to maintainer-approved March 17, 2026 00:58 — with GitHub Actions Failure

Copilot AI review requested due to automatic review settings March 17, 2026 02:45

ekzhu temporarily deployed to maintainer-approved March 17, 2026 02:45 — with GitHub Actions Inactive

Copilot started reviewing on behalf of ekzhu March 17, 2026 02:45 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread src/copaw/agents/utils/message_processing.py

Comment thread src/copaw/agents/utils/message_processing.py

Comment thread src/copaw/agents/utils/audio_transcription.py

Comment thread console/src/pages/Settings/VoiceTranscription/index.tsx

Merge branch 'main' into claude/fix-discord-voice-messages-NVPD1

ac546ba

ekzhu had a problem deploying to maintainer-approved March 17, 2026 07:57 — with GitHub Actions Failure

Fix typo

5d8cff3

Copilot AI review requested due to automatic review settings March 17, 2026 07:58

ekzhu had a problem deploying to maintainer-approved March 17, 2026 07:58 — with GitHub Actions Failure

Copilot started reviewing on behalf of ekzhu March 17, 2026 07:58 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread src/copaw/app/channels/dingtalk/content_utils.py

return AudioContent(

type=ContentType.AUDIO,

data=url,

format="amr",

Comment thread src/copaw/app/routers/agent.py

Comment thread src/copaw/agents/utils/message_processing.py

xieyxclack approved these changes Mar 17, 2026

View reviewed changes

xieyxclack merged commit 6b7abac into agentscope-ai:main Mar 17, 2026
7 of 8 checks passed

carlos999-hqsama mentioned this pull request Mar 17, 2026

[Bug]: Feishu channel does not transcribe voice messages - audio passed as FileContent instead of AudioContent #1690

Closed

ekzhu deleted the claude/fix-discord-voice-messages-NVPD1 branch March 18, 2026 07:31

tudan110 pushed a commit to tudan110/QwenPaw that referenced this pull request Apr 4, 2026

feat(voice): audio transcription for voice messages with Local Whispe…

a3fd7d3

…r support (agentscope-ai#1476)

Conversation

ekzhu commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Component(s) Affected

Checklist

Testing

Local Verification Evidence

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieyxclack commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ekzhu commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

ekzhu commented Mar 14, 2026 •

edited

Loading