Files
aboutme_chat/knowledge_service/knowledge_agent_plan.md
Sam Rolfe 628ba96998 Initial commit: Multi-service AI agent system
- Frontend: Vite + React + TypeScript chat interface
- Backend: FastAPI gateway with LangGraph routing
- Knowledge Service: ChromaDB RAG with Gitea scraper
- LangGraph Service: Multi-agent orchestration
- Airflow: Scheduled Gitea ingestion DAG
- Documentation: Complete plan and implementation guides

Architecture:
- Modular Docker Compose per service
- External ai-mesh network for communication
- Fast rebuilds with /app/packages pattern
- Intelligent agent routing (no hardcoded keywords)

Services:
- Frontend (5173): React chat UI
- Chat Gateway (8000): FastAPI entry point
- LangGraph (8090): Agent orchestration
- Knowledge (8080): ChromaDB RAG
- Airflow (8081): Scheduled ingestion
- PostgreSQL (5432): Chat history

Excludes: node_modules, .venv, chroma_db, logs, .env files
Includes: All source code, configs, docs, docker files
2026-02-27 19:51:06 +11:00

1.8 KiB

GOAL

Build a "Deep Knowledge Agent" (DKA) that acts as a secure, quarantined bridge between the Chat Gateway and private data sources.

ARCHITECTURE OVERVIEW

Layers

  1. Public Gateway: FastAPI (The "Voice").
  2. Orchestration Layer: LangGraph Supervisor (The "Router").
  3. Quarantined Agent: DKA / Librarian (The "Keeper of Secrets").
    • Strictly Read-Only.
    • Accesses ChromaDB and Media stores.
  4. Specialist Agent: Opencode (The "Engineer").

Data Sources (The "Knowledge Mesh")

  • Code: Gitea (Repos, Markdown docs).
  • Notes: Trilium Next, Obsidian, Flatnotes, HedgeDoc.
  • Wiki: DokuWiki.
  • Inventory: HomeBox (Physical gear, photos).
  • Tasks: Vikunja.
  • Media: Immich (Photos/Videos metadata via Gemini Vision).

Agent Tooling & Orchestration

  • Orchestrators: CAO CLI, Agent Pipe.
  • External Agents: Goose, Aider, Opencode (Specialist).

COMPONENT DETAILS

The Librarian (DKA - LangGraph)

  • Purpose: Semantic retrieval and data synthesis from vectors.
  • Tools:
    • query_chroma: Search the vector database.
    • fetch_media_link: Returns a signed URL/path for Immich/HomeBox images.
  • Constraints:
    • NO bash or write tools.

The Ingestion Pipeline (Airflow/Custom Python)

  • Multi-Source Scrapers: API-based (Gitea, Immich) and File-based (Obsidian).
  • Vision Integration: Gemini analyzes Immich photos to create searchable text descriptions.
  • Storage: ChromaDB (Vectors) + PostgreSQL (Metadata/Hashes).

[TODO]{.todo .TODO} LIST [0/4]

  • Create 'knowledgeservice' directory.
  • Implement test_rag.py (Hello World retrieval).
  • Build basic scraper for hobbies.org.
  • Integrate DKA logic into the FastAPI Gateway.