Frontend: ESP32-S3 acts as a dumb "microphone-to-MQTT" bridge.
Backend: Docker container performs Wake Word detection (OpenWakeWord) and Speech-to-Text (Faster-Whisper).
Controller: Home Assistant receives clean text commands for execution.

🏗️ Architecture

The system uses a Dual-Broker strategy to prevent flooding Home Assistant with raw audio data.

graph LR
    Mic[INMP441] -->|I2S| ESP32
    ESP32 -->|Raw Audio (Voice Broker)| MQTT_Voice[Mosquitto .13]
    
    subgraph Docker Server [.13]
        MQTT_Voice -->|Sub Audio| Bridge[Python Bridge]
        Bridge -->|HTTP| Whisper[Whisper API]
        Whisper -->|Text| Bridge
    end
    
    Bridge -->|Text Command (HA Broker)| MQTT_HA[Mosquitto .30]
    MQTT_HA -->|Trigger| HA[Home Assistant]
    HA -->|Action| Lights/Scripts

📂 File Structure

1. Firmware (Client)

Location: ~/Documents/Arduino/voice_assistant/

voice_assistant.ino: Main ESP32 firmware. Handles I2S reading, VAD (Voice Activity Detection), and MQTT streaming.
Dependencies: PubSubClient, Freenove_WS2812_Lib_for_ESP32.

2. Backend (Server)

Location: ~/voice_bridge/ (On Server .13)

docker-compose.yml: Orchestrates the Whisper and Bridge containers.
mqtt_audio_bridge.py: The "Brain".
- Listens to Audio on .13.
- Buffers audio and checks for Wake Word ("Hey Jarvis").
- Sends valid audio to Whisper.
- Publishes resulting text to Home Assistant Broker .30.
app.py: A lightweight Flask wrapper for faster-whisper (running in separate container).

🚦 LED Status Indicators (ESP32)

The LED Ring provides real-time feedback on the device state.

Color	State	Trigger
Off	Idle	Listening for sound (VAD).
Green	Recording	Sound detected above threshold. Streaming audio.
Blue	Processing	Silence detected. End of stream sent to server.
Rainbow	Acknowledged	"Wake Word" detected by server OR Command executed.
Red	Error	Wi-Fi or MQTT connection loss.

⚙️ Configuration Details

Hardware Config (ESP32-S3)

Mic SCK: GPIO 4
Mic WS: GPIO 5
Mic SD: GPIO 6
LED Pin: GPIO 5
VAD Threshold: 1000 (Adjust based on noise floor)
Soft Gain: x6 (Multiplies raw input to match Whisper requirements)

Docker Environment

Network: Internal bridge network between Bridge and Whisper.
Models:
- Wake Word: hey_jarvis (OpenWakeWord)
- STT: small.en (Faster-Whisper, int8 quantization)

🚀 Setup & Commands

1. Flash ESP32

arduino-cli upload -v -p /dev/ttyACM0 --fqbn esp32:esp32:esp32s3:CDCOnBoot=cdc,USBMode=hwcdc,PSRAM=disabled,FlashMode=dio,DebugLevel=info --input-dir ./build

2. Start Backend

cd ~/voice_bridge
docker compose up -d

3. Monitor Logs

docker compose logs -f voice_bridge

4. Monitor Audio Stream (Debug)

mosquitto_sub -h 192.168.20.13 -u "mqtt-user" -P "pass" -t "voice/audio_stream" | awk '{printf "."}'