Piper TTS Server (Local Text-to-Speech)

This repository contains scripts to generate high-quality Neural TTS audio using Piper and stream it directly to Snapcast for multi-room audio announcements.

⚡ Functionality

The primary script speak_direct.sh performs the following pipeline:

Input: Text string, Voice Model, Speed.
Generation: Calls the piper binary to generate a .wav file locally.
Processing: Uses sox to resample audio to 48kHz Stereo (matching Snapserver).
Playback: Pipes raw audio data directly to the Snapserver TCP stream (/tmp/snapfifo or TCP Port).

📂 File Manifest

speak_direct.sh: Main entry point called by Home Assistant via SSH.
- Usage: ./speak_direct.sh "Text to speak" "model_name.onnx" "1.0"
test_snapcast.sh: Debug tool to verify the Snapcast pipe connection (generates a sine wave).
play_wav_to_snapcast.sh: Helper utility to stream existing WAV files to the speakers.

🛠️ Integration with Home Assistant

Home Assistant triggers these scripts using a shell_command over SSH.

Home Assistant YAML:

shell_command:
  tts_direct_piper: >
    ssh -i /config/ssh/id_rsa -o StrictHostKeyChecking=no user@192.168.20.13 
    '/home/user/speech_piper/speak_direct.sh "{{ text }}" "{{ voice }}" "{{ speed }}"'

📦 Models

Voices are stored in the /data directory (ignored by Git due to size). Required files for each voice:

voice_name.onnx (Binary model)
voice_name.onnx.json (Config file)

Common Voices:

en_US-hal_6409-medium (HAL 9000 style)
en_US-trump-medium (Danny - Parody)
en_US-picard_7399-medium (Picard)