2026-01-09 10:32:01 +11:00
2026-01-09 10:32:01 +11:00

Piper TTS Server (Local Text-to-Speech)

This repository contains scripts to generate high-quality Neural TTS audio using Piper and stream it directly to Snapcast for multi-room audio announcements.

Functionality

The primary script speak_direct.sh performs the following pipeline:

  1. Input: Text string, Voice Model, Speed.
  2. Generation: Calls the piper binary to generate a .wav file locally.
  3. Processing: Uses sox to resample audio to 48kHz Stereo (matching Snapserver).
  4. Playback: Pipes raw audio data directly to the Snapserver TCP stream (/tmp/snapfifo or TCP Port).

📂 File Manifest

  • speak_direct.sh: Main entry point called by Home Assistant via SSH.
    • Usage: ./speak_direct.sh "Text to speak" "model_name.onnx" "1.0"
  • test_snapcast.sh: Debug tool to verify the Snapcast pipe connection (generates a sine wave).
  • play_wav_to_snapcast.sh: Helper utility to stream existing WAV files to the speakers.

🛠️ Integration with Home Assistant

Home Assistant triggers these scripts using a shell_command over SSH.

Home Assistant YAML:

shell_command:
  tts_direct_piper: >
    ssh -i /config/ssh/id_rsa -o StrictHostKeyChecking=no user@192.168.20.13 
    '/home/user/speech_piper/speak_direct.sh "{{ text }}" "{{ voice }}" "{{ speed }}"'

📦 Models

Voices are stored in the /data directory (ignored by Git due to size). Required files for each voice:

  1. voice_name.onnx (Binary model)
  2. voice_name.onnx.json (Config file)

Common Voices:

  • en_US-hal_6409-medium (HAL 9000 style)
  • en_US-trump-medium (Danny - Parody)
  • en_US-picard_7399-medium (Picard)
Description
No description provided
Readme 696 MiB
Languages
Shell 99.2%
Roff 0.8%