MCP 伺服器插件資訊 | MCP 服務詳情 | OpenClaw Study

Voice Server Voice input/output for Claude Desktop — real-time speech-to-text via faster-whisper with biquad noise filtering and rule-based emotion detecti…

Voice Server Voice input/output for Claude Desktop — real-time speech-to-text via faster-whisper with biquad noise filtering and rule-based emotion detection, paired with text-to-speech via edge-tts. Runs as a local HTTP server on localhost:5123 and exposes MCP tools for Claude Desktop integration. Features Speech-to-text via faster-whisper (base model, int8 quantized) Biquad noise filtering — 80 Hz highpass + 7.5 kHz lowpass removes hum and hiss Emotion detection — rule-based classifier from audio features (energy, pitch variance, ZCR, spectral centroid) Triple beep indicator when recording starts Real-time level monitoring — RMS printed per chunk Configurable — silence timeout, RMS threshold, min speech duration via query params or TOML config End-phrase stripping — automatically removes "send this", "done", "stop", etc. Response emotion analysis — text-based hedge/excitement/engagement scoring Requirements Python 3.11+ PortAudio (system library, required by PyAudio) ffmpeg (for MP3→WAV conversion in TTS playback) A working microphone Installation pip install -r requirements.txt PortAudio Windows: pip install pyaudio usually works. If not, download from PyAudio wheels. macOS: brew install portaudio && pip install pyaudio Linux: sudo apt install portaudio19-dev && pip install pyaudio ffmpeg Windows: winget install Gyan.FFmpeg or download from ffmpeg.org. Ensure ffmpeg is on your PATH, or set the VOICE_FFMPEG_PATH environment variable. macOS: brew install ffmpeg Linux: sudo apt install ffmpeg Quick Start Shell# Start the voice server python voice_server.py # Server runs on http://localhost:5123 # Endpoints: # GET /status - Health check # POST /listen?timeout=30 - Record + transcribe + emotion # &skip_emotion=true - Skip emotion detection # &skip_filter=true - Skip noise filtering # &silence_timeout=4.0 - Silence cutoff seconds # &min_speech_duration=3.0 - Min speech before checking silence # &rms_threshold=100 - Loudness floor (20-500)

本頁屬於 OpenClaw Skills 學習體系,涵蓋技能安裝、分類導覽與實戰連結。

English 简体中文 日本語 Español Português