- Frontend: Vite + React + TypeScript chat interface - Backend: FastAPI gateway with LangGraph routing - Knowledge Service: ChromaDB RAG with Gitea scraper - LangGraph Service: Multi-agent orchestration - Airflow: Scheduled Gitea ingestion DAG - Documentation: Complete plan and implementation guides Architecture: - Modular Docker Compose per service - External ai-mesh network for communication - Fast rebuilds with /app/packages pattern - Intelligent agent routing (no hardcoded keywords) Services: - Frontend (5173): React chat UI - Chat Gateway (8000): FastAPI entry point - LangGraph (8090): Agent orchestration - Knowledge (8080): ChromaDB RAG - Airflow (8081): Scheduled ingestion - PostgreSQL (5432): Chat history Excludes: node_modules, .venv, chroma_db, logs, .env files Includes: All source code, configs, docs, docker files
48 lines
1.8 KiB
Org Mode
48 lines
1.8 KiB
Org Mode
#+TITLE: Phase 3: Knowledge Engine & Agent Orchestration
|
|
#+AUTHOR: Giordano (via opencode)
|
|
#+OPTIONS: toc:2
|
|
|
|
* GOAL
|
|
Build a "Deep Knowledge Agent" (DKA) that acts as a secure, quarantined bridge between the Chat Gateway and private data sources.
|
|
|
|
* ARCHITECTURE OVERVIEW
|
|
** Layers
|
|
1. Public Gateway: FastAPI (The "Voice").
|
|
2. Orchestration Layer: LangGraph Supervisor (The "Router").
|
|
3. Quarantined Agent: DKA / Librarian (The "Keeper of Secrets").
|
|
- Strictly Read-Only.
|
|
- Accesses ChromaDB and Media stores.
|
|
4. Specialist Agent: Opencode (The "Engineer").
|
|
|
|
** Data Sources (The "Knowledge Mesh")
|
|
- [ ] *Code*: Gitea (Repos, Markdown docs).
|
|
- [ ] *Notes*: Trilium Next, Obsidian, Flatnotes, HedgeDoc.
|
|
- [ ] *Wiki*: DokuWiki.
|
|
- [ ] *Inventory*: HomeBox (Physical gear, photos).
|
|
- [ ] *Tasks*: Vikunja.
|
|
- [ ] *Media*: Immich (Photos/Videos metadata via Gemini Vision).
|
|
|
|
** Agent Tooling & Orchestration
|
|
- [ ] *Orchestrators*: CAO CLI, Agent Pipe.
|
|
- [ ] *External Agents*: Goose, Aider, Opencode (Specialist).
|
|
|
|
* COMPONENT DETAILS
|
|
** The Librarian (DKA - LangGraph)
|
|
- Purpose: Semantic retrieval and data synthesis from vectors.
|
|
- Tools:
|
|
- ~query_chroma~: Search the vector database.
|
|
- ~fetch_media_link~: Returns a signed URL/path for Immich/HomeBox images.
|
|
- Constraints:
|
|
- NO ~bash~ or ~write~ tools.
|
|
|
|
** The Ingestion Pipeline (Airflow/Custom Python)
|
|
- [ ] *Multi-Source Scrapers*: API-based (Gitea, Immich) and File-based (Obsidian).
|
|
- [ ] *Vision Integration*: Gemini analyzes Immich photos to create searchable text descriptions.
|
|
- [ ] *Storage*: ChromaDB (Vectors) + PostgreSQL (Metadata/Hashes).
|
|
|
|
* TODO LIST [0/4]
|
|
- [ ] Create 'knowledge_service' directory.
|
|
- [ ] Implement ~test_rag.py~ (Hello World retrieval).
|
|
- [ ] Build basic scraper for ~hobbies.org~.
|
|
- [ ] Integrate DKA logic into the FastAPI Gateway.
|