43 lines
1.7 KiB
Markdown
43 lines
1.7 KiB
Markdown
# Piper TTS Server (Local Text-to-Speech)
|
|
|
|
This repository contains scripts to generate high-quality Neural TTS audio using **Piper** and stream it directly to **Snapcast** for multi-room audio announcements.
|
|
|
|
## ⚡ Functionality
|
|
|
|
The primary script `speak_direct.sh` performs the following pipeline:
|
|
1. **Input:** Text string, Voice Model, Speed.
|
|
2. **Generation:** Calls the `piper` binary to generate a `.wav` file locally.
|
|
3. **Processing:** Uses `sox` to resample audio to 48kHz Stereo (matching Snapserver).
|
|
4. **Playback:** Pipes raw audio data directly to the Snapserver TCP stream (`/tmp/snapfifo` or TCP Port).
|
|
|
|
## 📂 File Manifest
|
|
|
|
* **`speak_direct.sh`**: Main entry point called by Home Assistant via SSH.
|
|
* Usage: `./speak_direct.sh "Text to speak" "model_name.onnx" "1.0"`
|
|
* **`test_snapcast.sh`**: Debug tool to verify the Snapcast pipe connection (generates a sine wave).
|
|
* **`play_wav_to_snapcast.sh`**: Helper utility to stream existing WAV files to the speakers.
|
|
|
|
## 🛠️ Integration with Home Assistant
|
|
|
|
Home Assistant triggers these scripts using a `shell_command` over SSH.
|
|
|
|
**Home Assistant YAML:**
|
|
```yaml
|
|
shell_command:
|
|
tts_direct_piper: >
|
|
ssh -i /config/ssh/id_rsa -o StrictHostKeyChecking=no user@192.168.20.13
|
|
'/home/user/speech_piper/speak_direct.sh "{{ text }}" "{{ voice }}" "{{ speed }}"'
|
|
```
|
|
|
|
## 📦 Models
|
|
|
|
Voices are stored in the `/data` directory (ignored by Git due to size).
|
|
Required files for each voice:
|
|
1. `voice_name.onnx` (Binary model)
|
|
2. `voice_name.onnx.json` (Config file)
|
|
|
|
**Common Voices:**
|
|
* `en_US-hal_6409-medium` (HAL 9000 style)
|
|
* `en_US-trump-medium` (Danny - Parody)
|
|
* `en_US-picard_7399-medium` (Picard)
|