Files
speech_piper/README.md
2026-01-09 10:32:01 +11:00

43 lines
1.7 KiB
Markdown

# Piper TTS Server (Local Text-to-Speech)
This repository contains scripts to generate high-quality Neural TTS audio using **Piper** and stream it directly to **Snapcast** for multi-room audio announcements.
## ⚡ Functionality
The primary script `speak_direct.sh` performs the following pipeline:
1. **Input:** Text string, Voice Model, Speed.
2. **Generation:** Calls the `piper` binary to generate a `.wav` file locally.
3. **Processing:** Uses `sox` to resample audio to 48kHz Stereo (matching Snapserver).
4. **Playback:** Pipes raw audio data directly to the Snapserver TCP stream (`/tmp/snapfifo` or TCP Port).
## 📂 File Manifest
* **`speak_direct.sh`**: Main entry point called by Home Assistant via SSH.
* Usage: `./speak_direct.sh "Text to speak" "model_name.onnx" "1.0"`
* **`test_snapcast.sh`**: Debug tool to verify the Snapcast pipe connection (generates a sine wave).
* **`play_wav_to_snapcast.sh`**: Helper utility to stream existing WAV files to the speakers.
## 🛠️ Integration with Home Assistant
Home Assistant triggers these scripts using a `shell_command` over SSH.
**Home Assistant YAML:**
```yaml
shell_command:
tts_direct_piper: >
ssh -i /config/ssh/id_rsa -o StrictHostKeyChecking=no user@192.168.20.13
'/home/user/speech_piper/speak_direct.sh "{{ text }}" "{{ voice }}" "{{ speed }}"'
```
## 📦 Models
Voices are stored in the `/data` directory (ignored by Git due to size).
Required files for each voice:
1. `voice_name.onnx` (Binary model)
2. `voice_name.onnx.json` (Config file)
**Common Voices:**
* `en_US-hal_6409-medium` (HAL 9000 style)
* `en_US-trump-medium` (Danny - Parody)
* `en_US-picard_7399-medium` (Picard)