95 lines
3.3 KiB
Markdown

#line 1 "/media/sam/8294CD2994CD2111/Users/Dell/Documents/Arduino/voice_assistant/README.md"
# AI Voice Assistant Gateway (ESP32-S3 + Docker)
## 🎯 Project Objective
A private, local voice assistant system that separates high-bandwidth audio streaming from smart home command logic.
- **Frontend:** ESP32-S3 acts as a dumb "microphone-to-MQTT" bridge.
- **Backend:** Docker container performs Wake Word detection (OpenWakeWord) and Speech-to-Text (Faster-Whisper).
- **Controller:** Home Assistant receives clean text commands for execution.
## 🏗️ Architecture
The system uses a **Dual-Broker** strategy to prevent flooding Home Assistant with raw audio data.
```mermaid
graph LR
Mic[INMP441] -->|I2S| ESP32
ESP32 -->|Raw Audio (Voice Broker)| MQTT_Voice[Mosquitto .13]
subgraph Docker Server [.13]
MQTT_Voice -->|Sub Audio| Bridge[Python Bridge]
Bridge -->|HTTP| Whisper[Whisper API]
Whisper -->|Text| Bridge
end
Bridge -->|Text Command (HA Broker)| MQTT_HA[Mosquitto .30]
MQTT_HA -->|Trigger| HA[Home Assistant]
HA -->|Action| Lights/Scripts
```
## 📂 File Structure
### 1. Firmware (Client)
Location: `~/Documents/Arduino/voice_assistant/`
* **`voice_assistant.ino`**: Main ESP32 firmware. Handles I2S reading, VAD (Voice Activity Detection), and MQTT streaming.
* **Dependencies**: `PubSubClient`, `Freenove_WS2812_Lib_for_ESP32`.
### 2. Backend (Server)
Location: `~/voice_bridge/` (On Server .13)
* **`docker-compose.yml`**: Orchestrates the Whisper and Bridge containers.
* **`mqtt_audio_bridge.py`**: The "Brain".
* Listens to Audio on `.13`.
* Buffers audio and checks for Wake Word ("Hey Jarvis").
* Sends valid audio to Whisper.
* Publishes resulting text to Home Assistant Broker `.30`.
* **`app.py`**: A lightweight Flask wrapper for `faster-whisper` (running in separate container).
## 🚦 LED Status Indicators (ESP32)
The LED Ring provides real-time feedback on the device state.
| Color | State | Trigger |
| :--- | :--- | :--- |
| **Off** | Idle | Listening for sound (VAD). |
| **Green** | Recording | Sound detected above threshold. Streaming audio. |
| **Blue** | Processing | Silence detected. End of stream sent to server. |
| **Rainbow** | Acknowledged | "Wake Word" detected by server OR Command executed. |
| **Red** | Error | Wi-Fi or MQTT connection loss. |
## ⚙️ Configuration Details
### Hardware Config (ESP32-S3)
* **Mic SCK:** GPIO 4
* **Mic WS:** GPIO 5
* **Mic SD:** GPIO 6
* **LED Pin:** GPIO 5
* **VAD Threshold:** `1000` (Adjust based on noise floor)
* **Soft Gain:** `x6` (Multiplies raw input to match Whisper requirements)
### Docker Environment
* **Network:** Internal bridge network between Bridge and Whisper.
* **Models:**
* Wake Word: `hey_jarvis` (OpenWakeWord)
* STT: `small.en` (Faster-Whisper, int8 quantization)
## 🚀 Setup & Commands
**1. Flash ESP32**
```bash
arduino-cli upload -v -p /dev/ttyACM0 --fqbn esp32:esp32:esp32s3:CDCOnBoot=cdc,USBMode=hwcdc,PSRAM=disabled,FlashMode=dio,DebugLevel=info --input-dir ./build
```
**2. Start Backend**
```bash
cd ~/voice_bridge
docker compose up -d
```
**3. Monitor Logs**
```bash
docker compose logs -f voice_bridge
```
**4. Monitor Audio Stream (Debug)**
```bash
mosquitto_sub -h 192.168.20.13 -u "mqtt-user" -P "pass" -t "voice/audio_stream" | awk '{printf "."}'
```