# AI Voice Assistant Gateway (ESP32-S3 + Docker)

## 🎯 Project Objective
A private, local voice assistant system that separates high-bandwidth audio streaming from smart home command logic. 
- **Frontend:** ESP32-S3 acts as a dumb "microphone-to-MQTT" bridge.
- **Backend:** Docker container performs Wake Word detection (OpenWakeWord) and Speech-to-Text (Faster-Whisper).
- **Controller:** Home Assistant receives clean text commands for execution.

## 🏗️ Architecture
The system uses a **Dual-Broker** strategy to prevent flooding Home Assistant with raw audio data.

```mermaid
graph LR
    Mic[INMP441] -->|I2S| ESP32
    ESP32 -->|Raw Audio (Voice Broker)| MQTT_Voice[Mosquitto .13]
    
    subgraph Docker Server [.13]
        MQTT_Voice -->|Sub Audio| Bridge[Python Bridge]
        Bridge -->|HTTP| Whisper[Whisper API]
        Whisper -->|Text| Bridge
    end
    
    Bridge -->|Text Command (HA Broker)| MQTT_HA[Mosquitto .30]
    MQTT_HA -->|Trigger| HA[Home Assistant]
    HA -->|Action| Lights/Scripts
```

## 📂 File Structure

### 1. Firmware (Client)
Location: `~/Documents/Arduino/voice_assistant/`
*   **`voice_assistant.ino`**: Main ESP32 firmware. Handles I2S reading, VAD (Voice Activity Detection), and MQTT streaming.
*   **Dependencies**: `PubSubClient`, `Freenove_WS2812_Lib_for_ESP32`.

### 2. Backend (Server)
Location: `~/voice_bridge/` (On Server .13)
*   **`docker-compose.yml`**: Orchestrates the Whisper and Bridge containers.
*   **`mqtt_audio_bridge.py`**: The "Brain".
    *   Listens to Audio on `.13`.
    *   Buffers audio and checks for Wake Word ("Hey Jarvis").
    *   Sends valid audio to Whisper.
    *   Publishes resulting text to Home Assistant Broker `.30`.
*   **`app.py`**: A lightweight Flask wrapper for `faster-whisper` (running in separate container).

## 🚦 LED Status Indicators (ESP32)
The LED Ring provides real-time feedback on the device state.

| Color | State | Trigger |
| :--- | :--- | :--- |
| **Off** | Idle | Listening for sound (VAD). |
| **Green** | Recording | Sound detected above threshold. Streaming audio. |
| **Blue** | Processing | Silence detected. End of stream sent to server. |
| **Rainbow** | Acknowledged | "Wake Word" detected by server OR Command executed. |
| **Red** | Error | Wi-Fi or MQTT connection loss. |

## ⚙️ Configuration Details

### Hardware Config (ESP32-S3)
*   **Mic SCK:** GPIO 4
*   **Mic WS:** GPIO 5
*   **Mic SD:** GPIO 6
*   **LED Pin:** GPIO 5
*   **VAD Threshold:** `1000` (Adjust based on noise floor)
*   **Soft Gain:** `x6` (Multiplies raw input to match Whisper requirements)

### Docker Environment
*   **Network:** Internal bridge network between Bridge and Whisper.
*   **Models:** 
    *   Wake Word: `hey_jarvis` (OpenWakeWord)
    *   STT: `small.en` (Faster-Whisper, int8 quantization)

## 🚀 Setup & Commands

**1. Flash ESP32**
```bash
arduino-cli upload -v -p /dev/ttyACM0 --fqbn esp32:esp32:esp32s3:CDCOnBoot=cdc,USBMode=hwcdc,PSRAM=disabled,FlashMode=dio,DebugLevel=info --input-dir ./build
```

**2. Start Backend**
```bash
cd ~/voice_bridge
docker compose up -d
```

**3. Monitor Logs**
```bash
docker compose logs -f voice_bridge
```

**4. Monitor Audio Stream (Debug)**
```bash
mosquitto_sub -h 192.168.20.13 -u "mqtt-user" -P "pass" -t "voice/audio_stream" | awk '{printf "."}'
```