# AI Voice Assistant Gateway (ESP32-S3 + Docker) ## 🎯 Project Objective A private, local voice assistant system that separates high-bandwidth audio streaming from smart home command logic. - **Frontend:** ESP32-S3 acts as a dumb "microphone-to-MQTT" bridge. - **Backend:** Docker container performs Wake Word detection (OpenWakeWord) and Speech-to-Text (Faster-Whisper). - **Controller:** Home Assistant receives clean text commands for execution. ## 🏗️ Architecture The system uses a **Dual-Broker** strategy to prevent flooding Home Assistant with raw audio data. ```mermaid graph LR Mic[INMP441] -->|I2S| ESP32 ESP32 -->|Raw Audio (Voice Broker)| MQTT_Voice[Mosquitto .13] subgraph Docker Server [.13] MQTT_Voice -->|Sub Audio| Bridge[Python Bridge] Bridge -->|HTTP| Whisper[Whisper API] Whisper -->|Text| Bridge end Bridge -->|Text Command (HA Broker)| MQTT_HA[Mosquitto .30] MQTT_HA -->|Trigger| HA[Home Assistant] HA -->|Action| Lights/Scripts ``` ## 📂 File Structure ### 1. Firmware (Client) Location: `~/Documents/Arduino/voice_assistant/` * **`voice_assistant.ino`**: Main ESP32 firmware. Handles I2S reading, VAD (Voice Activity Detection), and MQTT streaming. * **Dependencies**: `PubSubClient`, `Freenove_WS2812_Lib_for_ESP32`. ### 2. Backend (Server) Location: `~/voice_bridge/` (On Server .13) * **`docker-compose.yml`**: Orchestrates the Whisper and Bridge containers. * **`mqtt_audio_bridge.py`**: The "Brain". * Listens to Audio on `.13`. * Buffers audio and checks for Wake Word ("Hey Jarvis"). * Sends valid audio to Whisper. * Publishes resulting text to Home Assistant Broker `.30`. * **`app.py`**: A lightweight Flask wrapper for `faster-whisper` (running in separate container). ## 🚦 LED Status Indicators (ESP32) The LED Ring provides real-time feedback on the device state. | Color | State | Trigger | | :--- | :--- | :--- | | **Off** | Idle | Listening for sound (VAD). | | **Green** | Recording | Sound detected above threshold. Streaming audio. | | **Blue** | Processing | Silence detected. End of stream sent to server. | | **Rainbow** | Acknowledged | "Wake Word" detected by server OR Command executed. | | **Red** | Error | Wi-Fi or MQTT connection loss. | ## ⚙️ Configuration Details ### Hardware Config (ESP32-S3) * **Mic SCK:** GPIO 4 * **Mic WS:** GPIO 5 * **Mic SD:** GPIO 6 * **LED Pin:** GPIO 5 * **VAD Threshold:** `1000` (Adjust based on noise floor) * **Soft Gain:** `x6` (Multiplies raw input to match Whisper requirements) ### Docker Environment * **Network:** Internal bridge network between Bridge and Whisper. * **Models:** * Wake Word: `hey_jarvis` (OpenWakeWord) * STT: `small.en` (Faster-Whisper, int8 quantization) ## 🚀 Setup & Commands **1. Flash ESP32** ```bash arduino-cli upload -v -p /dev/ttyACM0 --fqbn esp32:esp32:esp32s3:CDCOnBoot=cdc,USBMode=hwcdc,PSRAM=disabled,FlashMode=dio,DebugLevel=info --input-dir ./build ``` **2. Start Backend** ```bash cd ~/voice_bridge docker compose up -d ``` **3. Monitor Logs** ```bash docker compose logs -f voice_bridge ``` **4. Monitor Audio Stream (Debug)** ```bash mosquitto_sub -h 192.168.20.13 -u "mqtt-user" -P "pass" -t "voice/audio_stream" | awk '{printf "."}' ```