Initial commit: Multi-service AI agent system
- Frontend: Vite + React + TypeScript chat interface - Backend: FastAPI gateway with LangGraph routing - Knowledge Service: ChromaDB RAG with Gitea scraper - LangGraph Service: Multi-agent orchestration - Airflow: Scheduled Gitea ingestion DAG - Documentation: Complete plan and implementation guides Architecture: - Modular Docker Compose per service - External ai-mesh network for communication - Fast rebuilds with /app/packages pattern - Intelligent agent routing (no hardcoded keywords) Services: - Frontend (5173): React chat UI - Chat Gateway (8000): FastAPI entry point - LangGraph (8090): Agent orchestration - Knowledge (8080): ChromaDB RAG - Airflow (8081): Scheduled ingestion - PostgreSQL (5432): Chat history Excludes: node_modules, .venv, chroma_db, logs, .env files Includes: All source code, configs, docs, docker files
This commit is contained in:
52
.gitignore
vendored
Normal file
52
.gitignore
vendored
Normal file
@@ -0,0 +1,52 @@
|
||||
# Dependencies
|
||||
node_modules/
|
||||
.venv/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
|
||||
# Build outputs
|
||||
dist/
|
||||
dist-ssr/
|
||||
|
||||
# Databases and vector stores
|
||||
chroma_db/
|
||||
*.sqlite3
|
||||
*.db
|
||||
|
||||
# Logs
|
||||
logs/
|
||||
*.log
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
pnpm-debug.log*
|
||||
|
||||
# Environment variables (secrets!)
|
||||
.env
|
||||
.env.local
|
||||
.env.*.local
|
||||
|
||||
# IDE
|
||||
.vscode/*
|
||||
!.vscode/extensions.json
|
||||
.idea/
|
||||
.DS_Store
|
||||
*.suo
|
||||
*.ntvs*
|
||||
*.njsproj
|
||||
*.sln
|
||||
*.sw?
|
||||
|
||||
# Airflow runtime
|
||||
airflow/logs/
|
||||
airflow/config/
|
||||
airflow/plugins/
|
||||
|
||||
# Testing
|
||||
.coverage
|
||||
htmlcov/
|
||||
.pytest_cache/
|
||||
|
||||
# Project management files (not code)
|
||||
action.md
|
||||
ideas.org
|
||||
project_journal.org
|
||||
107
README.md
Normal file
107
README.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# AboutMe AI Chat Demo
|
||||
|
||||
A comprehensive AI agent system with multi-service architecture for personal knowledge management and intelligent query responses.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
User Query → Chat Gateway → LangGraph Supervisor → [Librarian | Opencode | Brain]
|
||||
↓
|
||||
Knowledge Service (ChromaDB) ← Airflow ← Gitea API
|
||||
```
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Port | Technology | Purpose |
|
||||
|---------|------|------------|---------|
|
||||
| Frontend | 5173 | Vite + React + TS | Chat UI |
|
||||
| Chat Gateway | 8000 | FastAPI | API entry point |
|
||||
| LangGraph | 8090 | FastAPI + LangGraph | Agent orchestration |
|
||||
| Knowledge | 8080 | FastAPI + ChromaDB | RAG / Vector search |
|
||||
| Airflow | 8081 | Apache Airflow | Scheduled ingestion |
|
||||
| PostgreSQL | 5432 | Postgres 15 | Chat history |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Ensure Docker network exists
|
||||
docker network create ai-mesh
|
||||
|
||||
# 2. Start Knowledge Service
|
||||
cd knowledge_service && docker-compose up -d
|
||||
|
||||
# 3. Start LangGraph Service
|
||||
cd ../langgraph_service && docker-compose up -d
|
||||
|
||||
# 4. Start Chat Demo
|
||||
cd ../aboutme_chat_demo && docker-compose up -d
|
||||
|
||||
# 5. Start Airflow (optional)
|
||||
cd ../airflow && docker-compose up -d
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Create `.env` files in each service directory:
|
||||
|
||||
**knowledge_service/.env:**
|
||||
```
|
||||
OPENROUTER_API_KEY=your_key_here
|
||||
GITEA_URL=https://gitea.lab.audasmedia.com.au
|
||||
GITEA_TOKEN=your_token
|
||||
GITEA_USERNAME=sam
|
||||
```
|
||||
|
||||
**langgraph_service/.env:**
|
||||
```
|
||||
OPENCODE_PASSWORD=sam4jo
|
||||
```
|
||||
|
||||
**airflow/.env:**
|
||||
```
|
||||
AIRFLOW_UID=1000
|
||||
GITEA_TOKEN=your_token
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
aboutme_chat_demo/
|
||||
├── frontend/ # React chat interface
|
||||
├── backend/ # FastAPI gateway (routes to LangGraph)
|
||||
├── plan.md # Full project roadmap
|
||||
└── code_1.md # Implementation guide
|
||||
|
||||
knowledge_service/
|
||||
├── main.py # FastAPI + ChromaDB
|
||||
├── gitea_scraper.py # Gitea API integration
|
||||
└── docker-compose.yml
|
||||
|
||||
langgraph_service/
|
||||
├── main.py # FastAPI entry point
|
||||
├── supervisor_agent.py # LangGraph orchestration
|
||||
└── docker-compose.yml
|
||||
|
||||
airflow/
|
||||
├── dags/ # Workflow definitions
|
||||
│ └── gitea_ingestion_dag.py
|
||||
└── docker-compose.yml
|
||||
```
|
||||
|
||||
## Technologies
|
||||
|
||||
- **Frontend:** Vite, React 19, TypeScript, Tailwind CSS, TanStack Query
|
||||
- **Backend:** FastAPI, Python 3.11, httpx
|
||||
- **AI/ML:** LangGraph, LangChain, ChromaDB, OpenRouter API
|
||||
- **Orchestration:** Apache Airflow (CeleryExecutor)
|
||||
- **Infrastructure:** Docker, Docker Compose
|
||||
|
||||
## Documentation
|
||||
|
||||
- `plan.md` - Complete project roadmap (7 phases)
|
||||
- `code_1.md` - Modular implementation guide
|
||||
- `code.md` - Legacy implementation reference
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
144
airflow/dags/gitea_ingestion_dag.py
Normal file
144
airflow/dags/gitea_ingestion_dag.py
Normal file
@@ -0,0 +1,144 @@
|
||||
"""
|
||||
Airflow DAG for scheduled Gitea repository ingestion.
|
||||
Runs daily to fetch new/updated repos and ingest into ChromaDB.
|
||||
"""
|
||||
from datetime import datetime, timedelta
|
||||
from airflow import DAG
|
||||
from airflow.operators.python import PythonOperator
|
||||
from airflow.providers.http.operators.http import SimpleHttpOperator
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
# Add knowledge_service to path for imports
|
||||
sys.path.insert(0, '/opt/airflow/dags/repo')
|
||||
|
||||
default_args = {
|
||||
'owner': 'airflow',
|
||||
'depends_on_past': False,
|
||||
'email_on_failure': False,
|
||||
'email_on_retry': False,
|
||||
'retries': 1,
|
||||
'retry_delay': timedelta(minutes=5),
|
||||
}
|
||||
|
||||
def fetch_gitea_repos(**context):
|
||||
"""Task: Fetch all repositories from Gitea."""
|
||||
from gitea_scraper import GiteaScraper
|
||||
|
||||
scraper = GiteaScraper(
|
||||
base_url=os.getenv("GITEA_URL", "https://gitea.lab.audasmedia.com.au"),
|
||||
token=os.getenv("GITEA_TOKEN", ""),
|
||||
username=os.getenv("GITEA_USERNAME", "sam")
|
||||
)
|
||||
|
||||
repos = scraper.get_user_repos()
|
||||
|
||||
# Push to XCom for downstream tasks
|
||||
context['ti'].xcom_push(key='repo_count', value=len(repos))
|
||||
context['ti'].xcom_push(key='repos', value=[
|
||||
{
|
||||
'name': r.name,
|
||||
'description': r.description,
|
||||
'url': r.url,
|
||||
'updated_at': r.updated_at
|
||||
}
|
||||
for r in repos
|
||||
])
|
||||
|
||||
return f"Fetched {len(repos)} repositories"
|
||||
|
||||
def fetch_readmes(**context):
|
||||
"""Task: Fetch READMEs for all repositories."""
|
||||
from gitea_scraper import GiteaScraper
|
||||
|
||||
ti = context['ti']
|
||||
repos = ti.xcom_pull(task_ids='fetch_repos', key='repos')
|
||||
|
||||
scraper = GiteaScraper(
|
||||
base_url=os.getenv("GITEA_URL", "https://gitea.lab.audasmedia.com.au"),
|
||||
token=os.getenv("GITEA_TOKEN", ""),
|
||||
username=os.getenv("GITEA_USERNAME", "sam")
|
||||
)
|
||||
|
||||
readme_data = []
|
||||
for repo in repos[:10]: # Limit to 10 repos per run for testing
|
||||
readme = scraper.get_readme(repo['name'])
|
||||
if readme:
|
||||
readme_data.append({
|
||||
'repo': repo['name'],
|
||||
'content': readme[:5000], # First 5000 chars
|
||||
'url': repo['url']
|
||||
})
|
||||
|
||||
ti.xcom_push(key='readme_data', value=readme_data)
|
||||
|
||||
return f"Fetched {len(readme_data)} READMEs"
|
||||
|
||||
def ingest_to_chroma(**context):
|
||||
"""Task: Ingest fetched data into ChromaDB via knowledge service."""
|
||||
import httpx
|
||||
|
||||
ti = context['ti']
|
||||
readme_data = ti.xcom_pull(task_ids='fetch_readmes', key='readme_data')
|
||||
|
||||
knowledge_service_url = os.getenv("KNOWLEDGE_SERVICE_URL", "http://knowledge-service:8080")
|
||||
|
||||
documents_ingested = 0
|
||||
for item in readme_data:
|
||||
try:
|
||||
# Call knowledge service ingest endpoint
|
||||
response = httpx.post(
|
||||
f"{knowledge_service_url}/ingest",
|
||||
json={
|
||||
'source': f"gitea:{item['repo']}",
|
||||
'content': item['content'],
|
||||
'metadata': {
|
||||
'repo': item['repo'],
|
||||
'url': item['url'],
|
||||
'type': 'readme'
|
||||
}
|
||||
},
|
||||
timeout=30.0
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
documents_ingested += 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error ingesting {item['repo']}: {e}")
|
||||
|
||||
return f"Ingested {documents_ingested} documents into ChromaDB"
|
||||
|
||||
# Define the DAG
|
||||
with DAG(
|
||||
'gitea_daily_ingestion',
|
||||
default_args=default_args,
|
||||
description='Daily ingestion of Gitea repositories into knowledge base',
|
||||
schedule_interval=timedelta(days=1), # Run daily
|
||||
start_date=datetime(2024, 1, 1),
|
||||
catchup=False,
|
||||
tags=['gitea', 'ingestion', 'knowledge'],
|
||||
) as dag:
|
||||
|
||||
# Task 1: Fetch repository list
|
||||
fetch_repos_task = PythonOperator(
|
||||
task_id='fetch_repos',
|
||||
python_callable=fetch_gitea_repos,
|
||||
)
|
||||
|
||||
# Task 2: Fetch README content
|
||||
fetch_readmes_task = PythonOperator(
|
||||
task_id='fetch_readmes',
|
||||
python_callable=fetch_readmes,
|
||||
)
|
||||
|
||||
# Task 3: Ingest into ChromaDB
|
||||
ingest_task = PythonOperator(
|
||||
task_id='ingest_to_chroma',
|
||||
python_callable=ingest_to_chroma,
|
||||
)
|
||||
|
||||
# Define task dependencies
|
||||
fetch_repos_task >> fetch_readmes_task >> ingest_task
|
||||
|
||||
121
airflow/dags/gitea_scraper.py
Normal file
121
airflow/dags/gitea_scraper.py
Normal file
@@ -0,0 +1,121 @@
|
||||
import os
|
||||
import httpx
|
||||
import logging
|
||||
from typing import List, Dict, Optional
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class RepoMetadata:
|
||||
name: str
|
||||
description: str
|
||||
url: str
|
||||
default_branch: str
|
||||
updated_at: str
|
||||
language: Optional[str]
|
||||
|
||||
class GiteaScraper:
|
||||
def __init__(self, base_url: str, token: str, username: str = "sam"):
|
||||
self.base_url = base_url.rstrip("/")
|
||||
self.token = token
|
||||
self.username = username
|
||||
self.headers = {"Authorization": f"token {token}"}
|
||||
|
||||
def get_user_repos(self) -> List[RepoMetadata]:
|
||||
"""Fetch all repositories for the user."""
|
||||
repos = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"{self.base_url}/api/v1/users/{self.username}/repos?page={page}&limit=50"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=30.0)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
if not data:
|
||||
break
|
||||
|
||||
for repo in data:
|
||||
repos.append(RepoMetadata(
|
||||
name=repo["name"],
|
||||
description=repo.get("description", ""),
|
||||
url=repo["html_url"],
|
||||
default_branch=repo["default_branch"],
|
||||
updated_at=repo["updated_at"],
|
||||
language=repo.get("language")
|
||||
))
|
||||
|
||||
logger.info(f"Fetched page {page}, got {len(data)} repos")
|
||||
page += 1
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error fetching repos: {e}")
|
||||
break
|
||||
|
||||
return repos
|
||||
|
||||
def get_readme(self, repo_name: str) -> str:
|
||||
"""Fetch README content for a repository."""
|
||||
# Try common README filenames
|
||||
readme_names = ["README.md", "readme.md", "Readme.md", "README.rst"]
|
||||
|
||||
for readme_name in readme_names:
|
||||
url = f"{self.base_url}/api/v1/repos/{self.username}/{repo_name}/raw/{readme_name}"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=10.0)
|
||||
if response.status_code == 200:
|
||||
return response.text
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to fetch {readme_name}: {e}")
|
||||
continue
|
||||
|
||||
return ""
|
||||
|
||||
def get_repo_files(self, repo_name: str, path: str = "") -> List[Dict]:
|
||||
"""List files in a repository directory."""
|
||||
url = f"{self.base_url}/api/v1/repos/{self.username}/{repo_name}/contents/{path}"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=10.0)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except Exception as e:
|
||||
logger.error(f"Error listing files in {repo_name}/{path}: {e}")
|
||||
return []
|
||||
|
||||
def get_file_content(self, repo_name: str, filepath: str) -> str:
|
||||
"""Fetch content of a specific file."""
|
||||
url = f"{self.base_url}/api/v1/repos/{self.username}/{repo_name}/raw/{filepath}"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=10.0)
|
||||
if response.status_code == 200:
|
||||
return response.text
|
||||
except Exception as e:
|
||||
logger.error(f"Error fetching file {filepath}: {e}")
|
||||
|
||||
return ""
|
||||
|
||||
# Test function
|
||||
if __name__ == "__main__":
|
||||
scraper = GiteaScraper(
|
||||
base_url=os.getenv("GITEA_URL", "https://gitea.lab.audasmedia.com.au"),
|
||||
token=os.getenv("GITEA_TOKEN", ""),
|
||||
username=os.getenv("GITEA_USERNAME", "sam")
|
||||
)
|
||||
|
||||
repos = scraper.get_user_repos()
|
||||
print(f"Found {len(repos)} repositories")
|
||||
|
||||
for repo in repos[:3]: # Test with first 3
|
||||
print(f"\nRepo: {repo.name}")
|
||||
readme = scraper.get_readme(repo.name)
|
||||
if readme:
|
||||
print(f"README preview: {readme[:200]}...")
|
||||
|
||||
181
airflow/docker-compose.yml
Normal file
181
airflow/docker-compose.yml
Normal file
@@ -0,0 +1,181 @@
|
||||
version: '3.8'
|
||||
|
||||
x-airflow-common:
|
||||
&airflow-common
|
||||
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.8.1}
|
||||
environment:
|
||||
&airflow-common-env
|
||||
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
|
||||
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
|
||||
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
|
||||
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
|
||||
AIRFLOW__CORE__FERNET_KEY: ''
|
||||
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
|
||||
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
|
||||
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
|
||||
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
|
||||
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
|
||||
volumes:
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
|
||||
user: "${AIRFLOW_UID:-50000}:0"
|
||||
depends_on:
|
||||
&airflow-common-depends-on
|
||||
redis:
|
||||
condition: service_healthy
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:13
|
||||
environment:
|
||||
POSTGRES_USER: airflow
|
||||
POSTGRES_PASSWORD: airflow
|
||||
POSTGRES_DB: airflow
|
||||
volumes:
|
||||
- postgres-db-volume:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD", "pg_isready", "-U", "airflow"]
|
||||
interval: 10s
|
||||
retries: 5
|
||||
start_period: 5s
|
||||
restart: always
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
redis:
|
||||
image: redis:latest
|
||||
expose:
|
||||
- 6379
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 10s
|
||||
timeout: 30s
|
||||
retries: 50
|
||||
start_period: 30s
|
||||
restart: always
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
airflow-webserver:
|
||||
<<: *airflow-common
|
||||
command: webserver
|
||||
ports:
|
||||
- "8081:8080"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
airflow-scheduler:
|
||||
<<: *airflow-common
|
||||
command: scheduler
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
airflow-worker:
|
||||
<<: *airflow-common
|
||||
command: celery worker
|
||||
healthcheck:
|
||||
test:
|
||||
- "CMD-SHELL"
|
||||
- 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}" || celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
airflow-triggerer:
|
||||
<<: *airflow-common
|
||||
command: triggerer
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
airflow-init:
|
||||
<<: *airflow-common
|
||||
entrypoint: /bin/bash
|
||||
command:
|
||||
- -c
|
||||
- |
|
||||
if [[ -z "${AIRFLOW_UID}" ]]; then
|
||||
echo "WARNING!!!: AIRFLOW_UID not set!"
|
||||
echo "Using default UID: 50000"
|
||||
export AIRFLOW_UID=50000
|
||||
fi
|
||||
mkdir -p /sources/logs /sources/dags /sources/plugins
|
||||
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
|
||||
exec /entrypoint airflow version
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
_AIRFLOW_DB_MIGRATE: 'true'
|
||||
_AIRFLOW_WWW_USER_CREATE: 'true'
|
||||
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
|
||||
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
|
||||
user: "0:0"
|
||||
volumes:
|
||||
- ${AIRFLOW_PROJ_DIR:-.}:/sources
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
airflow-cli:
|
||||
<<: *airflow-common
|
||||
profiles:
|
||||
- debug
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
CONNECTION_CHECK_MAX_COUNT: "0"
|
||||
command:
|
||||
- bash
|
||||
- -c
|
||||
- airflow
|
||||
networks:
|
||||
- ai-mesh
|
||||
|
||||
volumes:
|
||||
postgres-db-volume:
|
||||
|
||||
networks:
|
||||
ai-mesh:
|
||||
external: true
|
||||
|
||||
8
backend/Dockerfile
Normal file
8
backend/Dockerfile
Normal file
@@ -0,0 +1,8 @@
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
RUN apt-get update && apt-get install -y libpq-dev gcc
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
COPY . .
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
|
||||
|
||||
58
backend/main.py
Normal file
58
backend/main.py
Normal file
@@ -0,0 +1,58 @@
|
||||
from fastapi import FastAPI
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
import httpx
|
||||
import logging
|
||||
import sys
|
||||
import traceback
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[logging.StreamHandler(sys.stdout)])
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
app = FastAPI()
|
||||
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"])
|
||||
|
||||
class MessageRequest(BaseModel):
|
||||
message: str
|
||||
|
||||
BRAIN_URL = "http://opencode-brain:5000"
|
||||
KNOWLEDGE_URL = "http://knowledge-service:8080/query"
|
||||
AUTH = httpx.BasicAuth("opencode", "sam4jo")
|
||||
|
||||
@app.post("/chat")
|
||||
async def chat(request: MessageRequest):
|
||||
user_msg = request.message.lower()
|
||||
timeout_long = httpx.Timeout(180.0, connect=10.0)
|
||||
timeout_short = httpx.Timeout(5.0, connect=2.0)
|
||||
|
||||
context = ""
|
||||
# Check for keywords to trigger Librarian (DB) lookup
|
||||
if any(kw in user_msg for kw in ["sam", "hobby", "music", "guitar", "skiing", "experience"]):
|
||||
logger.info("Gateway: Consulting Librarian (DB)...")
|
||||
async with httpx.AsyncClient(timeout=timeout_short) as client:
|
||||
try:
|
||||
k_res = await client.post(KNOWLEDGE_URL, json={"question": request.message})
|
||||
if k_res.status_code == 200:
|
||||
context = k_res.json().get("context", "")
|
||||
except Exception as e:
|
||||
logger.warning(f"Gateway: Librarian offline/slow: {str(e)}")
|
||||
|
||||
# Forward to Brain (LLM)
|
||||
async with httpx.AsyncClient(auth=AUTH, timeout=timeout_long) as brain_client:
|
||||
try:
|
||||
session_res = await brain_client.post(f"{BRAIN_URL}/session", json={"title": "Demo"})
|
||||
session_id = session_res.json()["id"]
|
||||
final_prompt = f"CONTEXT:\n{context}\n\nUSER: {request.message}" if context else request.message
|
||||
response = await brain_client.post(f"{BRAIN_URL}/session/{session_id}/message", json={"parts": [{"type": "text", "text": final_prompt}]})
|
||||
|
||||
# FIX: Iterate through parts array to find text response
|
||||
data = response.json()
|
||||
if "parts" in data:
|
||||
for part in data["parts"]:
|
||||
if part.get("type") == "text" and "text" in part:
|
||||
return {"response": part["text"]}
|
||||
|
||||
return {"response": "AI responded but no text found in expected format."}
|
||||
except Exception:
|
||||
logger.error(f"Gateway: Brain failure: {traceback.format_exc()}")
|
||||
return {"response": "Error: The Brain is taking too long or is disconnected."}
|
||||
49
backend/main.py.new
Normal file
49
backend/main.py.new
Normal file
@@ -0,0 +1,49 @@
|
||||
from fastapi import FastAPI
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
import httpx
|
||||
import logging
|
||||
import sys
|
||||
import traceback
|
||||
import os
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[logging.StreamHandler(sys.stdout)])
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
app = FastAPI()
|
||||
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"])
|
||||
|
||||
class MessageRequest(BaseModel):
|
||||
message: str
|
||||
|
||||
LANGGRAPH_URL = os.getenv("LANGGRAPH_URL", "http://langgraph-service:8090")
|
||||
|
||||
@app.post("/chat")
|
||||
async def chat(request: MessageRequest):
|
||||
"""Updated chat endpoint that routes through LangGraph Supervisor."""
|
||||
logger.info(f"Gateway: Received message: {request.message}")
|
||||
|
||||
try:
|
||||
# Call LangGraph Supervisor instead of direct brain
|
||||
async with httpx.AsyncClient(timeout=httpx.Timeout(60.0, connect=10.0)) as client:
|
||||
response = await client.post(
|
||||
f"{LANGGRAPH_URL}/query",
|
||||
json={"query": request.message}
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
logger.info(f"Gateway: Response from {result.get('agent_used', 'unknown')} agent")
|
||||
return {"response": result["response"]}
|
||||
else:
|
||||
logger.error(f"Gateway: LangGraph error {response.status_code}")
|
||||
return {"response": "Error: Orchestration service unavailable"}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Gateway: Error routing through LangGraph: {traceback.format_exc()}")
|
||||
return {"response": "Error: Unable to process your request at this time."}
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "healthy", "service": "chat-gateway"}
|
||||
|
||||
8
backend/requirements.txt
Normal file
8
backend/requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
fastapi
|
||||
uvicorn
|
||||
sqlalchemy
|
||||
psycopg2-binary
|
||||
pydantic
|
||||
httpx
|
||||
pytest
|
||||
pytest-asyncio
|
||||
79
backend/tests/test_gateway.py
Normal file
79
backend/tests/test_gateway.py
Normal file
@@ -0,0 +1,79 @@
|
||||
import pytest
|
||||
from fastapi.testclient import TestClient
|
||||
from main import app
|
||||
import httpx
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
client = TestClient(app)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_chat_general_query():
|
||||
"""Test that a general query (no personal keywords) skips the Librarian."""
|
||||
with patch("httpx.AsyncClient.post", new_callable=AsyncMock) as mock_post:
|
||||
# Mock Brain response
|
||||
mock_response = AsyncMock()
|
||||
mock_response.status_code = 200
|
||||
mock_response.json.return_value = {
|
||||
"info": {"id": "msg_123"},
|
||||
"parts": [{"type": "text", "text": "I am a general AI."}]
|
||||
}
|
||||
|
||||
# First call is for session creation, second for message
|
||||
mock_post.side_effect = [AsyncMock(status_code=200, json=lambda: {"id": "ses_123"}), mock_response]
|
||||
|
||||
response = client.post("/chat", json={"message": "What is 2+2?"})
|
||||
|
||||
assert response.status_code == 200
|
||||
assert response.json()["response"] == "I am a general AI."
|
||||
# Verify Librarian (knowledge-service) was NOT called
|
||||
# The knowledge service URL is http://knowledge-service:8080/query
|
||||
calls = [call.args[0] for call in mock_post.call_args_list]
|
||||
assert not any("knowledge-service" in url for url in calls)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_chat_personal_query_success():
|
||||
"""Test that a personal query calls the Librarian and injects context."""
|
||||
with patch("httpx.AsyncClient.post", new_callable=AsyncMock) as mock_post:
|
||||
# 1. Mock Librarian Response
|
||||
mock_k_res = AsyncMock()
|
||||
mock_k_res.status_code = 200
|
||||
mock_k_res.json.return_value = {"context": "Sam likes red guitars."}
|
||||
|
||||
# 2. Mock Brain Session Response
|
||||
mock_s_res = AsyncMock()
|
||||
mock_s_res.status_code = 200
|
||||
mock_s_res.json.return_value = {"id": "ses_123"}
|
||||
|
||||
# 3. Mock Brain Message Response
|
||||
mock_b_res = AsyncMock()
|
||||
mock_b_res.status_code = 200
|
||||
mock_b_res.json.return_value = {
|
||||
"parts": [{"type": "text", "text": "I see Sam likes red guitars."}]
|
||||
}
|
||||
|
||||
mock_post.side_effect = [mock_k_res, mock_s_res, mock_b_res]
|
||||
|
||||
response = client.post("/chat", json={"message": "Tell me about Sam's music"})
|
||||
|
||||
assert response.status_code == 200
|
||||
assert "red guitars" in response.json()["response"]
|
||||
|
||||
# Verify Librarian was called
|
||||
calls = [call.args[0] for call in mock_post.call_args_list]
|
||||
assert any("knowledge-service" in url for url in calls)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_chat_librarian_timeout_failover():
|
||||
"""Test that the gateway fails over instantly (5s) if Librarian is slow."""
|
||||
with patch("httpx.AsyncClient.post", new_callable=AsyncMock) as mock_post:
|
||||
# Mock Librarian Timeout
|
||||
mock_post.side_effect = [
|
||||
httpx.TimeoutException("Timeout"), # Librarian call
|
||||
AsyncMock(status_code=200, json=lambda: {"id": "ses_123"}), # Brain Session
|
||||
AsyncMock(status_code=200, json=lambda: {"parts": [{"type": "text", "text": "Direct Brain Response"}]}) # Brain Msg
|
||||
]
|
||||
|
||||
response = client.post("/chat", json={"message": "Sam's hobbies?"})
|
||||
|
||||
assert response.status_code == 200
|
||||
assert response.json()["response"] == "Direct Brain Response"
|
||||
41
docker-compose.yml
Normal file
41
docker-compose.yml
Normal file
@@ -0,0 +1,41 @@
|
||||
services:
|
||||
db:
|
||||
image: postgres:15-alpine
|
||||
environment:
|
||||
POSTGRES_USER: sam
|
||||
POSTGRES_PASSWORD: sam4jo
|
||||
POSTGRES_DB: chat_demo
|
||||
ports:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
networks:
|
||||
- ai-mesh
|
||||
backend:
|
||||
build: ./backend
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
DATABASE_URL: postgresql://sam:sam4jo@db:5432/chat_demo
|
||||
volumes:
|
||||
- ./backend:/app
|
||||
depends_on:
|
||||
- db
|
||||
networks:
|
||||
- ai-mesh
|
||||
frontend:
|
||||
build: ./frontend
|
||||
ports:
|
||||
- "5173:5173"
|
||||
volumes:
|
||||
- ./frontend:/app
|
||||
- /app/node_modules
|
||||
environment:
|
||||
- CHOKIDAR_USEPOLLING=true
|
||||
networks:
|
||||
- ai-mesh
|
||||
volumes:
|
||||
postgres_data:
|
||||
networks:
|
||||
ai-mesh:
|
||||
external: true
|
||||
24
frontend/.gitignore
vendored
Normal file
24
frontend/.gitignore
vendored
Normal file
@@ -0,0 +1,24 @@
|
||||
# Logs
|
||||
logs
|
||||
*.log
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
pnpm-debug.log*
|
||||
lerna-debug.log*
|
||||
|
||||
node_modules
|
||||
dist
|
||||
dist-ssr
|
||||
*.local
|
||||
|
||||
# Editor directories and files
|
||||
.vscode/*
|
||||
!.vscode/extensions.json
|
||||
.idea
|
||||
.DS_Store
|
||||
*.suo
|
||||
*.ntvs*
|
||||
*.njsproj
|
||||
*.sln
|
||||
*.sw?
|
||||
7
frontend/Dockerfile
Normal file
7
frontend/Dockerfile
Normal file
@@ -0,0 +1,7 @@
|
||||
FROM node:20-alpine
|
||||
WORKDIR /app
|
||||
COPY package.json pnpm-lock.yaml ./
|
||||
RUN npm install -g pnpm && pnpm install
|
||||
COPY . .
|
||||
CMD ["pnpm", "run", "dev", "--host", "0.0.0.0"]
|
||||
|
||||
73
frontend/README.md
Normal file
73
frontend/README.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# React + TypeScript + Vite
|
||||
|
||||
This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
|
||||
|
||||
Currently, two official plugins are available:
|
||||
|
||||
- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Babel](https://babeljs.io/) (or [oxc](https://oxc.rs) when used in [rolldown-vite](https://vite.dev/guide/rolldown)) for Fast Refresh
|
||||
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
|
||||
|
||||
## React Compiler
|
||||
|
||||
The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
|
||||
|
||||
## Expanding the ESLint configuration
|
||||
|
||||
If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
|
||||
|
||||
```js
|
||||
export default defineConfig([
|
||||
globalIgnores(['dist']),
|
||||
{
|
||||
files: ['**/*.{ts,tsx}'],
|
||||
extends: [
|
||||
// Other configs...
|
||||
|
||||
// Remove tseslint.configs.recommended and replace with this
|
||||
tseslint.configs.recommendedTypeChecked,
|
||||
// Alternatively, use this for stricter rules
|
||||
tseslint.configs.strictTypeChecked,
|
||||
// Optionally, add this for stylistic rules
|
||||
tseslint.configs.stylisticTypeChecked,
|
||||
|
||||
// Other configs...
|
||||
],
|
||||
languageOptions: {
|
||||
parserOptions: {
|
||||
project: ['./tsconfig.node.json', './tsconfig.app.json'],
|
||||
tsconfigRootDir: import.meta.dirname,
|
||||
},
|
||||
// other options...
|
||||
},
|
||||
},
|
||||
])
|
||||
```
|
||||
|
||||
You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
|
||||
|
||||
```js
|
||||
// eslint.config.js
|
||||
import reactX from 'eslint-plugin-react-x'
|
||||
import reactDom from 'eslint-plugin-react-dom'
|
||||
|
||||
export default defineConfig([
|
||||
globalIgnores(['dist']),
|
||||
{
|
||||
files: ['**/*.{ts,tsx}'],
|
||||
extends: [
|
||||
// Other configs...
|
||||
// Enable lint rules for React
|
||||
reactX.configs['recommended-typescript'],
|
||||
// Enable lint rules for React DOM
|
||||
reactDom.configs.recommended,
|
||||
],
|
||||
languageOptions: {
|
||||
parserOptions: {
|
||||
project: ['./tsconfig.node.json', './tsconfig.app.json'],
|
||||
tsconfigRootDir: import.meta.dirname,
|
||||
},
|
||||
// other options...
|
||||
},
|
||||
},
|
||||
])
|
||||
```
|
||||
23
frontend/eslint.config.js
Normal file
23
frontend/eslint.config.js
Normal file
@@ -0,0 +1,23 @@
|
||||
import js from '@eslint/js'
|
||||
import globals from 'globals'
|
||||
import reactHooks from 'eslint-plugin-react-hooks'
|
||||
import reactRefresh from 'eslint-plugin-react-refresh'
|
||||
import tseslint from 'typescript-eslint'
|
||||
import { defineConfig, globalIgnores } from 'eslint/config'
|
||||
|
||||
export default defineConfig([
|
||||
globalIgnores(['dist']),
|
||||
{
|
||||
files: ['**/*.{ts,tsx}'],
|
||||
extends: [
|
||||
js.configs.recommended,
|
||||
tseslint.configs.recommended,
|
||||
reactHooks.configs.flat.recommended,
|
||||
reactRefresh.configs.vite,
|
||||
],
|
||||
languageOptions: {
|
||||
ecmaVersion: 2020,
|
||||
globals: globals.browser,
|
||||
},
|
||||
},
|
||||
])
|
||||
13
frontend/index.html
Normal file
13
frontend/index.html
Normal file
@@ -0,0 +1,13 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>frontend</title>
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
<script type="module" src="/src/main.tsx"></script>
|
||||
</body>
|
||||
</html>
|
||||
36
frontend/package.json
Normal file
36
frontend/package.json
Normal file
@@ -0,0 +1,36 @@
|
||||
{
|
||||
"name": "frontend",
|
||||
"private": true,
|
||||
"version": "0.0.0",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "vite",
|
||||
"build": "tsc -b && vite build",
|
||||
"lint": "eslint .",
|
||||
"preview": "vite preview"
|
||||
},
|
||||
"dependencies": {
|
||||
"@tanstack/react-query": "^5.90.21",
|
||||
"axios": "^1.13.5",
|
||||
"react": "^19.2.0",
|
||||
"react-dom": "^19.2.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@eslint/js": "^9.39.1",
|
||||
"@tailwindcss/vite": "^4.2.0",
|
||||
"@types/node": "^24.10.1",
|
||||
"@types/react": "^19.2.7",
|
||||
"@types/react-dom": "^19.2.3",
|
||||
"@vitejs/plugin-react": "^5.1.1",
|
||||
"autoprefixer": "^10.4.24",
|
||||
"eslint": "^9.39.1",
|
||||
"eslint-plugin-react-hooks": "^7.0.1",
|
||||
"eslint-plugin-react-refresh": "^0.4.24",
|
||||
"globals": "^16.5.0",
|
||||
"postcss": "^8.5.6",
|
||||
"tailwindcss": "^4.2.0",
|
||||
"typescript": "~5.9.3",
|
||||
"typescript-eslint": "^8.48.0",
|
||||
"vite": "^7.3.1"
|
||||
}
|
||||
}
|
||||
2634
frontend/pnpm-lock.yaml
generated
Normal file
2634
frontend/pnpm-lock.yaml
generated
Normal file
File diff suppressed because it is too large
Load Diff
1
frontend/public/vite.svg
Normal file
1
frontend/public/vite.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="iconify iconify--logos" width="31.88" height="32" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 257"><defs><linearGradient id="IconifyId1813088fe1fbc01fb466" x1="-.828%" x2="57.636%" y1="7.652%" y2="78.411%"><stop offset="0%" stop-color="#41D1FF"></stop><stop offset="100%" stop-color="#BD34FE"></stop></linearGradient><linearGradient id="IconifyId1813088fe1fbc01fb467" x1="43.376%" x2="50.316%" y1="2.242%" y2="89.03%"><stop offset="0%" stop-color="#FFEA83"></stop><stop offset="8.333%" stop-color="#FFDD35"></stop><stop offset="100%" stop-color="#FFA800"></stop></linearGradient></defs><path fill="url(#IconifyId1813088fe1fbc01fb466)" d="M255.153 37.938L134.897 252.976c-2.483 4.44-8.862 4.466-11.382.048L.875 37.958c-2.746-4.814 1.371-10.646 6.827-9.67l120.385 21.517a6.537 6.537 0 0 0 2.322-.004l117.867-21.483c5.438-.991 9.574 4.796 6.877 9.62Z"></path><path fill="url(#IconifyId1813088fe1fbc01fb467)" d="M185.432.063L96.44 17.501a3.268 3.268 0 0 0-2.634 3.014l-5.474 92.456a3.268 3.268 0 0 0 3.997 3.378l24.777-5.718c2.318-.535 4.413 1.507 3.936 3.838l-7.361 36.047c-.495 2.426 1.782 4.5 4.151 3.78l15.304-4.649c2.372-.72 4.652 1.36 4.15 3.788l-11.698 56.621c-.732 3.542 3.979 5.473 5.943 2.437l1.313-2.028l72.516-144.72c1.215-2.423-.88-5.186-3.54-4.672l-25.505 4.922c-2.396.462-4.435-1.77-3.759-4.114l16.646-57.705c.677-2.35-1.37-4.583-3.769-4.113Z"></path></svg>
|
||||
|
After Width: | Height: | Size: 1.5 KiB |
42
frontend/src/App.css
Normal file
42
frontend/src/App.css
Normal file
@@ -0,0 +1,42 @@
|
||||
#root {
|
||||
max-width: 1280px;
|
||||
margin: 0 auto;
|
||||
padding: 2rem;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.logo {
|
||||
height: 6em;
|
||||
padding: 1.5em;
|
||||
will-change: filter;
|
||||
transition: filter 300ms;
|
||||
}
|
||||
.logo:hover {
|
||||
filter: drop-shadow(0 0 2em #646cffaa);
|
||||
}
|
||||
.logo.react:hover {
|
||||
filter: drop-shadow(0 0 2em #61dafbaa);
|
||||
}
|
||||
|
||||
@keyframes logo-spin {
|
||||
from {
|
||||
transform: rotate(0deg);
|
||||
}
|
||||
to {
|
||||
transform: rotate(360deg);
|
||||
}
|
||||
}
|
||||
|
||||
@media (prefers-reduced-motion: no-preference) {
|
||||
a:nth-of-type(2) .logo {
|
||||
animation: logo-spin infinite 20s linear;
|
||||
}
|
||||
}
|
||||
|
||||
.card {
|
||||
padding: 2em;
|
||||
}
|
||||
|
||||
.read-the-docs {
|
||||
color: #888;
|
||||
}
|
||||
17
frontend/src/App.tsx
Normal file
17
frontend/src/App.tsx
Normal file
@@ -0,0 +1,17 @@
|
||||
import { QueryClient, QueryClientProvider } from '@tanstack/react-query'
|
||||
import ChatInterface from './components/ChatInterface'
|
||||
|
||||
const queryClient = new QueryClient()
|
||||
|
||||
function App() {
|
||||
return (
|
||||
<QueryClientProvider client={queryClient}>
|
||||
<div className="min-h-screen bg-gray-950 p-4 md:p-8">
|
||||
<ChatInterface />
|
||||
</div>
|
||||
</QueryClientProvider>
|
||||
)
|
||||
}
|
||||
|
||||
export default App
|
||||
|
||||
1
frontend/src/assets/react.svg
Normal file
1
frontend/src/assets/react.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="iconify iconify--logos" width="35.93" height="32" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 228"><path fill="#00D8FF" d="M210.483 73.824a171.49 171.49 0 0 0-8.24-2.597c.465-1.9.893-3.777 1.273-5.621c6.238-30.281 2.16-54.676-11.769-62.708c-13.355-7.7-35.196.329-57.254 19.526a171.23 171.23 0 0 0-6.375 5.848a155.866 155.866 0 0 0-4.241-3.917C100.759 3.829 77.587-4.822 63.673 3.233C50.33 10.957 46.379 33.89 51.995 62.588a170.974 170.974 0 0 0 1.892 8.48c-3.28.932-6.445 1.924-9.474 2.98C17.309 83.498 0 98.307 0 113.668c0 15.865 18.582 31.778 46.812 41.427a145.52 145.52 0 0 0 6.921 2.165a167.467 167.467 0 0 0-2.01 9.138c-5.354 28.2-1.173 50.591 12.134 58.266c13.744 7.926 36.812-.22 59.273-19.855a145.567 145.567 0 0 0 5.342-4.923a168.064 168.064 0 0 0 6.92 6.314c21.758 18.722 43.246 26.282 56.54 18.586c13.731-7.949 18.194-32.003 12.4-61.268a145.016 145.016 0 0 0-1.535-6.842c1.62-.48 3.21-.974 4.76-1.488c29.348-9.723 48.443-25.443 48.443-41.52c0-15.417-17.868-30.326-45.517-39.844Zm-6.365 70.984c-1.4.463-2.836.91-4.3 1.345c-3.24-10.257-7.612-21.163-12.963-32.432c5.106-11 9.31-21.767 12.459-31.957c2.619.758 5.16 1.557 7.61 2.4c23.69 8.156 38.14 20.213 38.14 29.504c0 9.896-15.606 22.743-40.946 31.14Zm-10.514 20.834c2.562 12.94 2.927 24.64 1.23 33.787c-1.524 8.219-4.59 13.698-8.382 15.893c-8.067 4.67-25.32-1.4-43.927-17.412a156.726 156.726 0 0 1-6.437-5.87c7.214-7.889 14.423-17.06 21.459-27.246c12.376-1.098 24.068-2.894 34.671-5.345a134.17 134.17 0 0 1 1.386 6.193ZM87.276 214.515c-7.882 2.783-14.16 2.863-17.955.675c-8.075-4.657-11.432-22.636-6.853-46.752a156.923 156.923 0 0 1 1.869-8.499c10.486 2.32 22.093 3.988 34.498 4.994c7.084 9.967 14.501 19.128 21.976 27.15a134.668 134.668 0 0 1-4.877 4.492c-9.933 8.682-19.886 14.842-28.658 17.94ZM50.35 144.747c-12.483-4.267-22.792-9.812-29.858-15.863c-6.35-5.437-9.555-10.836-9.555-15.216c0-9.322 13.897-21.212 37.076-29.293c2.813-.98 5.757-1.905 8.812-2.773c3.204 10.42 7.406 21.315 12.477 32.332c-5.137 11.18-9.399 22.249-12.634 32.792a134.718 134.718 0 0 1-6.318-1.979Zm12.378-84.26c-4.811-24.587-1.616-43.134 6.425-47.789c8.564-4.958 27.502 2.111 47.463 19.835a144.318 144.318 0 0 1 3.841 3.545c-7.438 7.987-14.787 17.08-21.808 26.988c-12.04 1.116-23.565 2.908-34.161 5.309a160.342 160.342 0 0 1-1.76-7.887Zm110.427 27.268a347.8 347.8 0 0 0-7.785-12.803c8.168 1.033 15.994 2.404 23.343 4.08c-2.206 7.072-4.956 14.465-8.193 22.045a381.151 381.151 0 0 0-7.365-13.322Zm-45.032-43.861c5.044 5.465 10.096 11.566 15.065 18.186a322.04 322.04 0 0 0-30.257-.006c4.974-6.559 10.069-12.652 15.192-18.18ZM82.802 87.83a323.167 323.167 0 0 0-7.227 13.238c-3.184-7.553-5.909-14.98-8.134-22.152c7.304-1.634 15.093-2.97 23.209-3.984a321.524 321.524 0 0 0-7.848 12.897Zm8.081 65.352c-8.385-.936-16.291-2.203-23.593-3.793c2.26-7.3 5.045-14.885 8.298-22.6a321.187 321.187 0 0 0 7.257 13.246c2.594 4.48 5.28 8.868 8.038 13.147Zm37.542 31.03c-5.184-5.592-10.354-11.779-15.403-18.433c4.902.192 9.899.29 14.978.29c5.218 0 10.376-.117 15.453-.343c-4.985 6.774-10.018 12.97-15.028 18.486Zm52.198-57.817c3.422 7.8 6.306 15.345 8.596 22.52c-7.422 1.694-15.436 3.058-23.88 4.071a382.417 382.417 0 0 0 7.859-13.026a347.403 347.403 0 0 0 7.425-13.565Zm-16.898 8.101a358.557 358.557 0 0 1-12.281 19.815a329.4 329.4 0 0 1-23.444.823c-7.967 0-15.716-.248-23.178-.732a310.202 310.202 0 0 1-12.513-19.846h.001a307.41 307.41 0 0 1-10.923-20.627a310.278 310.278 0 0 1 10.89-20.637l-.001.001a307.318 307.318 0 0 1 12.413-19.761c7.613-.576 15.42-.876 23.31-.876H128c7.926 0 15.743.303 23.354.883a329.357 329.357 0 0 1 12.335 19.695a358.489 358.489 0 0 1 11.036 20.54a329.472 329.472 0 0 1-11 20.722Zm22.56-122.124c8.572 4.944 11.906 24.881 6.52 51.026c-.344 1.668-.73 3.367-1.15 5.09c-10.622-2.452-22.155-4.275-34.23-5.408c-7.034-10.017-14.323-19.124-21.64-27.008a160.789 160.789 0 0 1 5.888-5.4c18.9-16.447 36.564-22.941 44.612-18.3ZM128 90.808c12.625 0 22.86 10.235 22.86 22.86s-10.235 22.86-22.86 22.86s-22.86-10.235-22.86-22.86s10.235-22.86 22.86-22.86Z"></path></svg>
|
||||
|
After Width: | Height: | Size: 4.0 KiB |
125
frontend/src/components/ChatInterface.tsx
Normal file
125
frontend/src/components/ChatInterface.tsx
Normal file
@@ -0,0 +1,125 @@
|
||||
import { useState, useRef, useEffect } from 'react';
|
||||
import { useMutation } from '@tanstack/react-query';
|
||||
import axios from 'axios';
|
||||
|
||||
type Message = {
|
||||
id: string;
|
||||
text: string;
|
||||
sender: 'user' | 'ai';
|
||||
};
|
||||
|
||||
export default function ChatInterface() {
|
||||
const [messages, setMessages] = useState<Message[]>([]);
|
||||
const [input, setInput] = useState('');
|
||||
const messagesEndRef = useRef<HTMLDivElement>(null);
|
||||
|
||||
const scrollToBottom = () => {
|
||||
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
|
||||
};
|
||||
|
||||
useEffect(() => {
|
||||
scrollToBottom();
|
||||
}, [messages]);
|
||||
|
||||
const chatMutation = useMutation({
|
||||
mutationFn: async (messageText: string) => {
|
||||
const response = await axios.post('http://localhost:8000/chat', {
|
||||
message: messageText,
|
||||
});
|
||||
return response.data;
|
||||
},
|
||||
onSuccess: (data) => {
|
||||
setMessages((prev) => [
|
||||
...prev,
|
||||
{ id: Date.now().toString(), text: data.response, sender: 'ai' },
|
||||
]);
|
||||
},
|
||||
onError: () => {
|
||||
setMessages((prev) => [
|
||||
...prev,
|
||||
{ id: Date.now().toString(), text: "Error: Could not connect to the backend.", sender: 'ai' },
|
||||
]);
|
||||
},
|
||||
});
|
||||
|
||||
const handleSubmit = (e: React.FormEvent) => {
|
||||
e.preventDefault();
|
||||
if (!input.trim() || chatMutation.isPending) return;
|
||||
|
||||
const userMessage = input.trim();
|
||||
setInput('');
|
||||
setMessages((prev) => [
|
||||
...prev,
|
||||
{ id: Date.now().toString(), text: userMessage, sender: 'user' },
|
||||
]);
|
||||
|
||||
chatMutation.mutate(userMessage);
|
||||
};
|
||||
|
||||
return (
|
||||
<div className="flex flex-col h-full max-w-2xl mx-auto border border-gray-700 rounded-lg overflow-hidden bg-gray-800 shadow-xl mt-8">
|
||||
{/* Header */}
|
||||
<div className="bg-gray-900 p-4 border-b border-gray-700">
|
||||
<h2 className="text-xl font-semibold text-white">Sam Rolfe - AI</h2>
|
||||
<p className="text-sm text-gray-400">Ask about skills, experience, hobbies</p>
|
||||
</div>
|
||||
|
||||
{/* Message Area */}
|
||||
<div className="flex-1 overflow-y-auto p-4 space-y-4 min-h-[400px]">
|
||||
{messages.length === 0 && (
|
||||
<div className="text-center text-gray-500 mt-10">
|
||||
Send a message to start the conversation!
|
||||
</div>
|
||||
)}
|
||||
{messages.map((msg) => (
|
||||
<div
|
||||
key={msg.id}
|
||||
className={`flex ${msg.sender === 'user' ? 'justify-end' : 'justify-start'}`}
|
||||
>
|
||||
<div
|
||||
className={`max-w-[80%] rounded-2xl px-4 py-2 ${
|
||||
msg.sender === 'user'
|
||||
? 'bg-blue-600 text-white rounded-tr-none'
|
||||
: 'bg-gray-700 text-gray-100 rounded-tl-none'
|
||||
}`}
|
||||
>
|
||||
{msg.text}
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
{chatMutation.isPending && (
|
||||
<div className="flex justify-start">
|
||||
<div className="bg-gray-700 text-gray-400 rounded-2xl rounded-tl-none px-4 py-2 flex space-x-2 items-center">
|
||||
<div className="w-2 h-2 bg-gray-500 rounded-full animate-bounce" />
|
||||
<div className="w-2 h-2 bg-gray-500 rounded-full animate-bounce [animation-delay:0.2s]" />
|
||||
<div className="w-2 h-2 bg-gray-500 rounded-full animate-bounce [animation-delay:0.4s]" />
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
<div ref={messagesEndRef} />
|
||||
</div>
|
||||
|
||||
{/* Input Area */}
|
||||
<form onSubmit={handleSubmit} className="p-4 bg-gray-900 border-t border-gray-700">
|
||||
<div className="flex space-x-2">
|
||||
<input
|
||||
type="text"
|
||||
value={input}
|
||||
onChange={(e) => setInput(e.target.value)}
|
||||
placeholder="Type your message..."
|
||||
className="flex-1 bg-gray-800 text-white border border-gray-700 rounded-lg px-4 py-2 focus:outline-none focus:border-blue-500 transition-colors"
|
||||
disabled={chatMutation.isPending}
|
||||
/>
|
||||
<button
|
||||
type="submit"
|
||||
disabled={!input.trim() || chatMutation.isPending}
|
||||
className="bg-blue-600 hover:bg-blue-700 disabled:bg-blue-800 disabled:text-gray-400 text-white px-6 py-2 rounded-lg font-medium transition-colors"
|
||||
>
|
||||
Send
|
||||
</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
5
frontend/src/index.css
Normal file
5
frontend/src/index.css
Normal file
@@ -0,0 +1,5 @@
|
||||
@import "tailwindcss";
|
||||
|
||||
@theme {
|
||||
--font-sans: system-ui, Avenir, Helvetica, Arial, sans-serif;
|
||||
}
|
||||
10
frontend/src/main.tsx
Normal file
10
frontend/src/main.tsx
Normal file
@@ -0,0 +1,10 @@
|
||||
import { StrictMode } from 'react'
|
||||
import { createRoot } from 'react-dom/client'
|
||||
import './index.css'
|
||||
import App from './App.tsx'
|
||||
|
||||
createRoot(document.getElementById('root')!).render(
|
||||
<StrictMode>
|
||||
<App />
|
||||
</StrictMode>,
|
||||
)
|
||||
28
frontend/tsconfig.app.json
Normal file
28
frontend/tsconfig.app.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo",
|
||||
"target": "ES2022",
|
||||
"useDefineForClassFields": true,
|
||||
"lib": ["ES2022", "DOM", "DOM.Iterable"],
|
||||
"module": "ESNext",
|
||||
"types": ["vite/client"],
|
||||
"skipLibCheck": true,
|
||||
|
||||
/* Bundler mode */
|
||||
"moduleResolution": "bundler",
|
||||
"allowImportingTsExtensions": true,
|
||||
"verbatimModuleSyntax": true,
|
||||
"moduleDetection": "force",
|
||||
"noEmit": true,
|
||||
"jsx": "react-jsx",
|
||||
|
||||
/* Linting */
|
||||
"strict": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"erasableSyntaxOnly": true,
|
||||
"noFallthroughCasesInSwitch": true,
|
||||
"noUncheckedSideEffectImports": true
|
||||
},
|
||||
"include": ["src"]
|
||||
}
|
||||
7
frontend/tsconfig.json
Normal file
7
frontend/tsconfig.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"files": [],
|
||||
"references": [
|
||||
{ "path": "./tsconfig.app.json" },
|
||||
{ "path": "./tsconfig.node.json" }
|
||||
]
|
||||
}
|
||||
26
frontend/tsconfig.node.json
Normal file
26
frontend/tsconfig.node.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo",
|
||||
"target": "ES2023",
|
||||
"lib": ["ES2023"],
|
||||
"module": "ESNext",
|
||||
"types": ["node"],
|
||||
"skipLibCheck": true,
|
||||
|
||||
/* Bundler mode */
|
||||
"moduleResolution": "bundler",
|
||||
"allowImportingTsExtensions": true,
|
||||
"verbatimModuleSyntax": true,
|
||||
"moduleDetection": "force",
|
||||
"noEmit": true,
|
||||
|
||||
/* Linting */
|
||||
"strict": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"erasableSyntaxOnly": true,
|
||||
"noFallthroughCasesInSwitch": true,
|
||||
"noUncheckedSideEffectImports": true
|
||||
},
|
||||
"include": ["vite.config.ts"]
|
||||
}
|
||||
11
frontend/vite.config.ts
Normal file
11
frontend/vite.config.ts
Normal file
@@ -0,0 +1,11 @@
|
||||
import { defineConfig } from 'vite'
|
||||
import react from '@vitejs/plugin-react'
|
||||
import tailwindcss from '@tailwindcss/vite'
|
||||
|
||||
// https://vite.dev/config/
|
||||
export default defineConfig({
|
||||
plugins: [
|
||||
tailwindcss(),
|
||||
react()
|
||||
],
|
||||
})
|
||||
29
knowledge_service/Dockerfile
Normal file
29
knowledge_service/Dockerfile
Normal file
@@ -0,0 +1,29 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
libstdc++6 \
|
||||
gcc \
|
||||
g++ \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Create directories
|
||||
RUN mkdir -p /app/packages /app/code
|
||||
|
||||
# Install Python packages to a specific location
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install --target=/app/packages -r requirements.txt
|
||||
|
||||
# Copy initial code (will be overridden by volume mount in dev)
|
||||
COPY . /app/code/
|
||||
|
||||
# Set Python to find packages in /app/packages
|
||||
ENV PYTHONPATH=/app/packages
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
WORKDIR /app/code
|
||||
EXPOSE 8080
|
||||
|
||||
CMD ["python3", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
|
||||
|
||||
15
knowledge_service/data/hobbies.md
Normal file
15
knowledge_service/data/hobbies.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Sam's Hobbies
|
||||
|
||||
## Music
|
||||
- Enjoys playing guitar and synthesizers.
|
||||
- Collects vintage vinyl.
|
||||
|
||||
## Gardening
|
||||
- Maintains a local vegetable patch.
|
||||
- Focuses on organic heirloom tomatoes.
|
||||
|
||||
## Skiing
|
||||
- Advanced skier, prefers off-piste and backcountry in the Alps.
|
||||
|
||||
## Art
|
||||
- Digital illustration and oil painting.
|
||||
24
knowledge_service/docker-compose.yml
Normal file
24
knowledge_service/docker-compose.yml
Normal file
@@ -0,0 +1,24 @@
|
||||
services:
|
||||
knowledge-service:
|
||||
build: .
|
||||
image: sam/knowledge-service:latest
|
||||
container_name: knowledge-service
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
# Only mount the code directory, not packages
|
||||
- ./data:/app/code/data
|
||||
- ./chroma_db:/app/code/chroma_db
|
||||
- ./main.py:/app/code/main.py:ro # Read-only mount for safety
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
|
||||
- PYTHONPATH=/app/packages
|
||||
networks:
|
||||
- ai-mesh
|
||||
restart: unless-stopped
|
||||
|
||||
networks:
|
||||
ai-mesh:
|
||||
external: true
|
||||
|
||||
121
knowledge_service/gitea_scraper.py
Normal file
121
knowledge_service/gitea_scraper.py
Normal file
@@ -0,0 +1,121 @@
|
||||
import os
|
||||
import httpx
|
||||
import logging
|
||||
from typing import List, Dict, Optional
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class RepoMetadata:
|
||||
name: str
|
||||
description: str
|
||||
url: str
|
||||
default_branch: str
|
||||
updated_at: str
|
||||
language: Optional[str]
|
||||
|
||||
class GiteaScraper:
|
||||
def __init__(self, base_url: str, token: str, username: str = "sam"):
|
||||
self.base_url = base_url.rstrip("/")
|
||||
self.token = token
|
||||
self.username = username
|
||||
self.headers = {"Authorization": f"token {token}"}
|
||||
|
||||
def get_user_repos(self) -> List[RepoMetadata]:
|
||||
"""Fetch all repositories for the user."""
|
||||
repos = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"{self.base_url}/api/v1/users/{self.username}/repos?page={page}&limit=50"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=30.0)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
if not data:
|
||||
break
|
||||
|
||||
for repo in data:
|
||||
repos.append(RepoMetadata(
|
||||
name=repo["name"],
|
||||
description=repo.get("description", ""),
|
||||
url=repo["html_url"],
|
||||
default_branch=repo["default_branch"],
|
||||
updated_at=repo["updated_at"],
|
||||
language=repo.get("language")
|
||||
))
|
||||
|
||||
logger.info(f"Fetched page {page}, got {len(data)} repos")
|
||||
page += 1
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error fetching repos: {e}")
|
||||
break
|
||||
|
||||
return repos
|
||||
|
||||
def get_readme(self, repo_name: str) -> str:
|
||||
"""Fetch README content for a repository."""
|
||||
# Try common README filenames
|
||||
readme_names = ["README.md", "readme.md", "Readme.md", "README.rst"]
|
||||
|
||||
for readme_name in readme_names:
|
||||
url = f"{self.base_url}/api/v1/repos/{self.username}/{repo_name}/raw/{readme_name}"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=10.0)
|
||||
if response.status_code == 200:
|
||||
return response.text
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to fetch {readme_name}: {e}")
|
||||
continue
|
||||
|
||||
return ""
|
||||
|
||||
def get_repo_files(self, repo_name: str, path: str = "") -> List[Dict]:
|
||||
"""List files in a repository directory."""
|
||||
url = f"{self.base_url}/api/v1/repos/{self.username}/{repo_name}/contents/{path}"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=10.0)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except Exception as e:
|
||||
logger.error(f"Error listing files in {repo_name}/{path}: {e}")
|
||||
return []
|
||||
|
||||
def get_file_content(self, repo_name: str, filepath: str) -> str:
|
||||
"""Fetch content of a specific file."""
|
||||
url = f"{self.base_url}/api/v1/repos/{self.username}/{repo_name}/raw/{filepath}"
|
||||
|
||||
try:
|
||||
response = httpx.get(url, headers=self.headers, timeout=10.0)
|
||||
if response.status_code == 200:
|
||||
return response.text
|
||||
except Exception as e:
|
||||
logger.error(f"Error fetching file {filepath}: {e}")
|
||||
|
||||
return ""
|
||||
|
||||
# Test function
|
||||
if __name__ == "__main__":
|
||||
scraper = GiteaScraper(
|
||||
base_url=os.getenv("GITEA_URL", "https://gitea.lab.audasmedia.com.au"),
|
||||
token=os.getenv("GITEA_TOKEN", ""),
|
||||
username=os.getenv("GITEA_USERNAME", "sam")
|
||||
)
|
||||
|
||||
repos = scraper.get_user_repos()
|
||||
print(f"Found {len(repos)} repositories")
|
||||
|
||||
for repo in repos[:3]: # Test with first 3
|
||||
print(f"\nRepo: {repo.name}")
|
||||
readme = scraper.get_readme(repo.name)
|
||||
if readme:
|
||||
print(f"README preview: {readme[:200]}...")
|
||||
|
||||
56
knowledge_service/knowledge_agent_plan.md
Normal file
56
knowledge_service/knowledge_agent_plan.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# GOAL
|
||||
|
||||
Build a \"Deep Knowledge Agent\" (DKA) that acts as a secure,
|
||||
quarantined bridge between the Chat Gateway and private data sources.
|
||||
|
||||
# ARCHITECTURE OVERVIEW
|
||||
|
||||
## Layers
|
||||
|
||||
1. Public Gateway: FastAPI (The \"Voice\").
|
||||
2. Orchestration Layer: LangGraph Supervisor (The \"Router\").
|
||||
3. Quarantined Agent: DKA / Librarian (The \"Keeper of Secrets\").
|
||||
- Strictly Read-Only.
|
||||
- Accesses ChromaDB and Media stores.
|
||||
4. Specialist Agent: Opencode (The \"Engineer\").
|
||||
|
||||
## Data Sources (The \"Knowledge Mesh\")
|
||||
|
||||
- [ ] **Code**: Gitea (Repos, Markdown docs).
|
||||
- [ ] **Notes**: Trilium Next, Obsidian, Flatnotes, HedgeDoc.
|
||||
- [ ] **Wiki**: DokuWiki.
|
||||
- [ ] **Inventory**: HomeBox (Physical gear, photos).
|
||||
- [ ] **Tasks**: Vikunja.
|
||||
- [ ] **Media**: Immich (Photos/Videos metadata via Gemini Vision).
|
||||
|
||||
## Agent Tooling & Orchestration
|
||||
|
||||
- [ ] **Orchestrators**: CAO CLI, Agent Pipe.
|
||||
- [ ] **External Agents**: Goose, Aider, Opencode (Specialist).
|
||||
|
||||
# COMPONENT DETAILS
|
||||
|
||||
## The Librarian (DKA - LangGraph)
|
||||
|
||||
- Purpose: Semantic retrieval and data synthesis from vectors.
|
||||
- Tools:
|
||||
- `query_chroma`: Search the vector database.
|
||||
- `fetch_media_link`: Returns a signed URL/path for Immich/HomeBox
|
||||
images.
|
||||
- Constraints:
|
||||
- NO `bash` or `write` tools.
|
||||
|
||||
## The Ingestion Pipeline (Airflow/Custom Python)
|
||||
|
||||
- [ ] **Multi-Source Scrapers**: API-based (Gitea, Immich) and
|
||||
File-based (Obsidian).
|
||||
- [ ] **Vision Integration**: Gemini analyzes Immich photos to create
|
||||
searchable text descriptions.
|
||||
- [ ] **Storage**: ChromaDB (Vectors) + PostgreSQL (Metadata/Hashes).
|
||||
|
||||
# [TODO]{.todo .TODO} LIST \[0/4\] {#list-04}
|
||||
|
||||
- [ ] Create \'knowledge~service~\' directory.
|
||||
- [ ] Implement `test_rag.py` (Hello World retrieval).
|
||||
- [ ] Build basic scraper for `hobbies.org`.
|
||||
- [ ] Integrate DKA logic into the FastAPI Gateway.
|
||||
47
knowledge_service/knowledge_agent_plan.org
Normal file
47
knowledge_service/knowledge_agent_plan.org
Normal file
@@ -0,0 +1,47 @@
|
||||
#+TITLE: Phase 3: Knowledge Engine & Agent Orchestration
|
||||
#+AUTHOR: Giordano (via opencode)
|
||||
#+OPTIONS: toc:2
|
||||
|
||||
* GOAL
|
||||
Build a "Deep Knowledge Agent" (DKA) that acts as a secure, quarantined bridge between the Chat Gateway and private data sources.
|
||||
|
||||
* ARCHITECTURE OVERVIEW
|
||||
** Layers
|
||||
1. Public Gateway: FastAPI (The "Voice").
|
||||
2. Orchestration Layer: LangGraph Supervisor (The "Router").
|
||||
3. Quarantined Agent: DKA / Librarian (The "Keeper of Secrets").
|
||||
- Strictly Read-Only.
|
||||
- Accesses ChromaDB and Media stores.
|
||||
4. Specialist Agent: Opencode (The "Engineer").
|
||||
|
||||
** Data Sources (The "Knowledge Mesh")
|
||||
- [ ] *Code*: Gitea (Repos, Markdown docs).
|
||||
- [ ] *Notes*: Trilium Next, Obsidian, Flatnotes, HedgeDoc.
|
||||
- [ ] *Wiki*: DokuWiki.
|
||||
- [ ] *Inventory*: HomeBox (Physical gear, photos).
|
||||
- [ ] *Tasks*: Vikunja.
|
||||
- [ ] *Media*: Immich (Photos/Videos metadata via Gemini Vision).
|
||||
|
||||
** Agent Tooling & Orchestration
|
||||
- [ ] *Orchestrators*: CAO CLI, Agent Pipe.
|
||||
- [ ] *External Agents*: Goose, Aider, Opencode (Specialist).
|
||||
|
||||
* COMPONENT DETAILS
|
||||
** The Librarian (DKA - LangGraph)
|
||||
- Purpose: Semantic retrieval and data synthesis from vectors.
|
||||
- Tools:
|
||||
- ~query_chroma~: Search the vector database.
|
||||
- ~fetch_media_link~: Returns a signed URL/path for Immich/HomeBox images.
|
||||
- Constraints:
|
||||
- NO ~bash~ or ~write~ tools.
|
||||
|
||||
** The Ingestion Pipeline (Airflow/Custom Python)
|
||||
- [ ] *Multi-Source Scrapers*: API-based (Gitea, Immich) and File-based (Obsidian).
|
||||
- [ ] *Vision Integration*: Gemini analyzes Immich photos to create searchable text descriptions.
|
||||
- [ ] *Storage*: ChromaDB (Vectors) + PostgreSQL (Metadata/Hashes).
|
||||
|
||||
* TODO LIST [0/4]
|
||||
- [ ] Create 'knowledge_service' directory.
|
||||
- [ ] Implement ~test_rag.py~ (Hello World retrieval).
|
||||
- [ ] Build basic scraper for ~hobbies.org~.
|
||||
- [ ] Integrate DKA logic into the FastAPI Gateway.
|
||||
52
knowledge_service/main.py
Normal file
52
knowledge_service/main.py
Normal file
@@ -0,0 +1,52 @@
|
||||
from fastapi import FastAPI
|
||||
from pydantic import BaseModel
|
||||
from langchain_community.document_loaders import TextLoader
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
from langchain_community.vectorstores import Chroma
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
import os
|
||||
import logging
|
||||
import sys
|
||||
|
||||
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
app = FastAPI()
|
||||
vector_db = None
|
||||
|
||||
# Voyage-2 embeddings via OpenRouter API
|
||||
embeddings = OpenAIEmbeddings(
|
||||
model="openai/text-embedding-3-small",
|
||||
openai_api_base="https://openrouter.ai/api/v1",
|
||||
openai_api_key=os.getenv("OPENROUTER_API_KEY")
|
||||
)
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup_event():
|
||||
global vector_db
|
||||
data_path = "./data/hobbies.md"
|
||||
if os.path.exists(data_path):
|
||||
try:
|
||||
loader = TextLoader(data_path)
|
||||
documents = loader.load()
|
||||
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
|
||||
chunks = text_splitter.split_documents(documents)
|
||||
vector_db = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory="./chroma_db")
|
||||
logger.info("Librarian: ChromaDB is loaded with openAi embeddings.")
|
||||
except Exception as e:
|
||||
logger.error(f"Librarian: DB error: {str(e)}")
|
||||
else:
|
||||
logger.warning(f"Librarian: Missing data file at {data_path}")
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "ready", "vectors_loaded": vector_db is not None}
|
||||
|
||||
class QueryRequest(BaseModel):
|
||||
question: str
|
||||
|
||||
@app.post("/query")
|
||||
async def query_knowledge(request: QueryRequest):
|
||||
if not vector_db: return {"context": ""}
|
||||
results = vector_db.similarity_search(request.question, k=2)
|
||||
return {"context": "\n".join([res.page_content for res in results])}
|
||||
7
knowledge_service/requirements.txt
Normal file
7
knowledge_service/requirements.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
fastapi
|
||||
uvicorn
|
||||
langchain
|
||||
langchain-community
|
||||
langchain-openai
|
||||
langchain-text-splitters
|
||||
chromadb
|
||||
22
langgraph_service/Dockerfile
Normal file
22
langgraph_service/Dockerfile
Normal file
@@ -0,0 +1,22 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
gcc \
|
||||
g++ \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Create app directory
|
||||
WORKDIR /app
|
||||
|
||||
# Copy requirements
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy code
|
||||
COPY . .
|
||||
|
||||
EXPOSE 8090
|
||||
|
||||
CMD ["python3", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8090"]
|
||||
|
||||
80
langgraph_service/main.py
Normal file
80
langgraph_service/main.py
Normal file
@@ -0,0 +1,80 @@
|
||||
from fastapi import FastAPI
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
from supervisor_agent import process_query
|
||||
import logging
|
||||
import sys
|
||||
|
||||
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
app = FastAPI(title="LangGraph Supervisor Service")
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
class QueryRequest(BaseModel):
|
||||
query: str
|
||||
|
||||
class QueryResponse(BaseModel):
|
||||
response: str
|
||||
agent_used: str
|
||||
context: dict
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "healthy", "service": "langgraph-supervisor"}
|
||||
|
||||
@app.post("/query", response_model=QueryResponse)
|
||||
async def query_supervisor(request: QueryRequest):
|
||||
"""Main entry point for agent orchestration."""
|
||||
logger.info(f"Received query: {request.query}")
|
||||
|
||||
try:
|
||||
result = await process_query(request.query)
|
||||
|
||||
return QueryResponse(
|
||||
response=result["response"],
|
||||
agent_used=result["context"].get("source", "unknown"),
|
||||
context=result["context"]
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing query: {e}")
|
||||
return QueryResponse(
|
||||
response="Error processing your request",
|
||||
agent_used="error",
|
||||
context={"error": str(e)}
|
||||
)
|
||||
|
||||
@app.get("/agents")
|
||||
async def list_agents():
|
||||
"""List available specialist agents."""
|
||||
return {
|
||||
"agents": [
|
||||
{
|
||||
"name": "librarian",
|
||||
"description": "Queries the knowledge base for semantic information",
|
||||
"triggers": ["repo", "code", "git", "hobby", "about", "skill"]
|
||||
},
|
||||
{
|
||||
"name": "opencode",
|
||||
"description": "Handles coding tasks and file modifications",
|
||||
"triggers": ["write", "edit", "create", "fix", "implement"]
|
||||
},
|
||||
{
|
||||
"name": "brain",
|
||||
"description": "General LLM for reasoning and generation",
|
||||
"triggers": ["default", "general questions"]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8090)
|
||||
|
||||
9
langgraph_service/requirements.txt
Normal file
9
langgraph_service/requirements.txt
Normal file
@@ -0,0 +1,9 @@
|
||||
fastapi
|
||||
uvicorn
|
||||
langgraph
|
||||
langchain
|
||||
langchain-community
|
||||
langchain-openai
|
||||
httpx
|
||||
pydantic
|
||||
|
||||
153
langgraph_service/supervisor_agent.py
Normal file
153
langgraph_service/supervisor_agent.py
Normal file
@@ -0,0 +1,153 @@
|
||||
from typing import TypedDict, Annotated, Sequence
|
||||
from langgraph.graph import StateGraph, END
|
||||
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
|
||||
import operator
|
||||
import httpx
|
||||
import os
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# State definition
|
||||
class AgentState(TypedDict):
|
||||
messages: Annotated[Sequence[BaseMessage], operator.add]
|
||||
next_agent: str
|
||||
context: dict
|
||||
|
||||
# Agent routing logic
|
||||
def supervisor_node(state: AgentState):
|
||||
"""Supervisor decides which specialist agent to call."""
|
||||
last_message = state["messages"][-1].content.lower()
|
||||
|
||||
# Simple routing logic based on keywords
|
||||
if any(kw in last_message for kw in ["repo", "code", "git", "github", "gitea", "project", "development"]):
|
||||
return {"next_agent": "librarian"}
|
||||
elif any(kw in last_message for kw in ["write", "edit", "create", "fix", "bug", "implement", "code change"]):
|
||||
return {"next_agent": "opencode"}
|
||||
elif any(kw in last_message for kw in ["sam", "hobby", "music", "experience", "skill", "about"]):
|
||||
return {"next_agent": "librarian"}
|
||||
else:
|
||||
return {"next_agent": "brain"} # Default to general LLM
|
||||
|
||||
def librarian_agent(state: AgentState):
|
||||
"""Librarian agent - queries knowledge base (ChromaDB)."""
|
||||
last_message = state["messages"][-1].content
|
||||
|
||||
try:
|
||||
# Call knowledge service
|
||||
response = httpx.post(
|
||||
"http://knowledge-service:8080/query",
|
||||
json={"question": last_message},
|
||||
timeout=10.0
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
context = response.json().get("context", "")
|
||||
return {
|
||||
"messages": [AIMessage(content=f"Based on my knowledge base:\n\n{context}")],
|
||||
"context": {"source": "librarian", "context": context}
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"Librarian error: {e}")
|
||||
|
||||
return {
|
||||
"messages": [AIMessage(content="I couldn't find relevant information in the knowledge base.")],
|
||||
"context": {"source": "librarian", "error": str(e)}
|
||||
}
|
||||
|
||||
def opencode_agent(state: AgentState):
|
||||
"""Opencode agent - handles coding tasks via MCP."""
|
||||
last_message = state["messages"][-1].content
|
||||
|
||||
# Placeholder - would integrate with opencode-brain
|
||||
return {
|
||||
"messages": [AIMessage(content=f"I'm the coding agent. I would help you with: {last_message}")],
|
||||
"context": {"source": "opencode", "action": "coding_task"}
|
||||
}
|
||||
|
||||
def brain_agent(state: AgentState):
|
||||
"""Brain agent - general LLM fallback."""
|
||||
last_message = state["messages"][-1].content
|
||||
|
||||
try:
|
||||
# Call opencode-brain service
|
||||
auth = httpx.BasicAuth("opencode", os.getenv("OPENCODE_PASSWORD", "sam4jo"))
|
||||
timeout_long = httpx.Timeout(180.0, connect=10.0)
|
||||
|
||||
with httpx.AsyncClient(auth=auth, timeout=timeout_long) as client:
|
||||
# Create session
|
||||
session_res = client.post("http://opencode-brain:5000/session", json={"title": "Supervisor Query"})
|
||||
session_id = session_res.json()["id"]
|
||||
|
||||
# Send message
|
||||
response = client.post(
|
||||
f"http://opencode-brain:5000/session/{session_id}/message",
|
||||
json={"parts": [{"type": "text", "text": last_message}]}
|
||||
)
|
||||
|
||||
data = response.json()
|
||||
if "parts" in data:
|
||||
for part in data["parts"]:
|
||||
if part.get("type") == "text":
|
||||
return {
|
||||
"messages": [AIMessage(content=part["text"])],
|
||||
"context": {"source": "brain"}
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"Brain error: {e}")
|
||||
|
||||
return {
|
||||
"messages": [AIMessage(content="I'm thinking about this...")],
|
||||
"context": {"source": "brain"}
|
||||
}
|
||||
|
||||
def route_decision(state: AgentState):
|
||||
"""Routing function based on supervisor decision."""
|
||||
return state["next_agent"]
|
||||
|
||||
# Build the graph
|
||||
workflow = StateGraph(AgentState)
|
||||
|
||||
# Add nodes
|
||||
workflow.add_node("supervisor", supervisor_node)
|
||||
workflow.add_node("librarian", librarian_agent)
|
||||
workflow.add_node("opencode", opencode_agent)
|
||||
workflow.add_node("brain", brain_agent)
|
||||
|
||||
# Add edges
|
||||
workflow.set_entry_point("supervisor")
|
||||
|
||||
# Conditional routing from supervisor
|
||||
workflow.add_conditional_edges(
|
||||
"supervisor",
|
||||
route_decision,
|
||||
{
|
||||
"librarian": "librarian",
|
||||
"opencode": "opencode",
|
||||
"brain": "brain"
|
||||
}
|
||||
)
|
||||
|
||||
# All specialist agents end
|
||||
workflow.add_edge("librarian", END)
|
||||
workflow.add_edge("opencode", END)
|
||||
workflow.add_edge("brain", END)
|
||||
|
||||
# Compile the graph
|
||||
supervisor_graph = workflow.compile()
|
||||
|
||||
# Main entry point for queries
|
||||
async def process_query(query: str) -> dict:
|
||||
"""Process a query through the supervisor graph."""
|
||||
result = await supervisor_graph.ainvoke({
|
||||
"messages": [HumanMessage(content=query)],
|
||||
"next_agent": "",
|
||||
"context": {}
|
||||
})
|
||||
|
||||
return {
|
||||
"response": result["messages"][-1].content,
|
||||
"context": result.get("context", {})
|
||||
}
|
||||
|
||||
396
plan.md
Normal file
396
plan.md
Normal file
@@ -0,0 +1,396 @@
|
||||
# Project Plan: aboutme_chat_demo
|
||||
|
||||
## Goal
|
||||
Build a comprehensive AI agent system that ingests data from self-hosted services (Gitea, notes, wikis), stores it in a vector database, and provides intelligent responses through a multi-agent orchestration layer. The system emphasizes modular containerized architecture, industry-standard tools, and employment-relevant skills.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Foundation & Core Infrastructure (COMPLETED)
|
||||
|
||||
### Phase 1.1: Frontend Application
|
||||
**Location:** `/home/sam/development/aboutme_chat_demo/frontend/`
|
||||
|
||||
**Stack & Tools:**
|
||||
- **Framework:** Vite 6.2.0 + React 19.0.0 + TypeScript
|
||||
- **Styling:** Tailwind CSS 4.0.0
|
||||
- **State Management:** TanStack Query (React Query) 5.67.0
|
||||
- **Build Tool:** Vite with React plugin
|
||||
- **Linting:** ESLint 9.21.0 + typescript-eslint 8.24.0
|
||||
|
||||
**Components Implemented:**
|
||||
- `ChatInterface.tsx` - Auto-expanding text input with scrolling message list
|
||||
- `App.tsx` - Main application container
|
||||
- Real-time chat UI with message history
|
||||
- HTTP client integration to backend gateway
|
||||
|
||||
**Docker Configuration:**
|
||||
- Hot-reload development setup
|
||||
- Volume mounting for instant code changes
|
||||
- Node modules isolation (`/app/node_modules`)
|
||||
|
||||
### Phase 1.2: Chat Gateway (Orchestration Entry Point)
|
||||
**Location:** `/home/sam/development/aboutme_chat_demo/backend/`
|
||||
|
||||
**Stack & Tools:**
|
||||
- **Framework:** FastAPI (Python 3.11)
|
||||
- **HTTP Client:** httpx 0.28.1
|
||||
- **CORS:** Configured for all origins (development)
|
||||
|
||||
**Architecture Changes:**
|
||||
- **OLD:** Hardcoded keyword matching (`["sam", "hobby", "music", "guitar", "skiing", "experience"]`) to trigger knowledge lookup
|
||||
- **NEW:** Thin routing layer - all queries passed to LangGraph Supervisor for intelligent agent selection
|
||||
- Removed direct Brain (LLM) integration
|
||||
- Removed direct Knowledge Service calls
|
||||
- Now acts as stateless entry point to LangGraph orchestration layer
|
||||
|
||||
**Endpoints:**
|
||||
- `POST /chat` - Routes queries to LangGraph Supervisor
|
||||
- `GET /health` - Service health check
|
||||
- `GET /agents` - Lists available agents from LangGraph
|
||||
|
||||
### Phase 1.3: Knowledge Service (Librarian Agent)
|
||||
**Location:** `/home/sam/development/knowledge_service/`
|
||||
|
||||
**Stack & Tools:**
|
||||
- **Framework:** FastAPI + Uvicorn
|
||||
- **Vector Database:** ChromaDB 1.5.1
|
||||
- **Embeddings:** OpenAI via OpenRouter API (text-embedding-3-small)
|
||||
- **LLM Framework:** LangChain ecosystem
|
||||
- langchain 1.2.10
|
||||
- langchain-community 0.4.1
|
||||
- langchain-core 1.2.15
|
||||
- langchain-text-splitters 1.1.1
|
||||
- langchain-openai
|
||||
- **Document Processing:** RecursiveCharacterTextSplitter
|
||||
|
||||
**Key Files:**
|
||||
- `main.py` - FastAPI endpoints for /query and /health
|
||||
- `gitea_scraper.py` - Gitea API integration module (NEW)
|
||||
- `data/hobbies.md` - Sample knowledge base content
|
||||
- `chroma_db/` - Persistent vector storage
|
||||
|
||||
**Docker Architecture (Optimized):**
|
||||
- **Pattern:** Separate `/app/packages` (cached) from `/app/code` (volume-mounted)
|
||||
- **Benefits:**
|
||||
- Code changes apply instantly without rebuild
|
||||
- Package installation happens once during image build
|
||||
- PYTHONPATH=/app/packages ensures imports work
|
||||
- **Volumes:**
|
||||
- `./data:/app/code/data` - Knowledge documents
|
||||
- `./chroma_db:/app/code/chroma_db` - Vector database persistence
|
||||
- `./main.py:/app/code/main.py:ro` - Read-only code mount
|
||||
|
||||
### Phase 1.4: LangGraph Supervisor Service (NEW)
|
||||
**Location:** `/home/sam/development/langgraph_service/`
|
||||
|
||||
**Stack & Tools:**
|
||||
- **Framework:** FastAPI + Uvicorn
|
||||
- **Orchestration:** LangGraph 1.0.9
|
||||
- langgraph-checkpoint 4.0.0
|
||||
- langgraph-prebuilt 1.0.8
|
||||
- langgraph-sdk 0.3.9
|
||||
- **State Management:** TypedDict with Annotated operators
|
||||
- **Message Types:** LangChain Core Messages (HumanMessage, AIMessage)
|
||||
|
||||
**Architecture:**
|
||||
- **Supervisor Node:** Analyzes queries and routes to specialist agents
|
||||
- **Agent Graph:** StateGraph with conditional edges
|
||||
- **Three Specialist Agents:**
|
||||
1. **Librarian Agent** - Queries ChromaDB via knowledge-service:8080
|
||||
2. **Opencode Agent** - Placeholder for coding tasks (MCP integration ready)
|
||||
3. **Brain Agent** - Fallback to OpenCode Brain LLM (opencode-brain:5000)
|
||||
|
||||
**Routing Logic:**
|
||||
```
|
||||
Query → Supervisor → [Librarian | Opencode | Brain]
|
||||
- "repo/code/git/project" → Librarian (RAG)
|
||||
- "write/edit/create/fix" → Opencode (Coding)
|
||||
- "sam/hobby/music/about" → Librarian (RAG)
|
||||
- Default → Brain (General LLM)
|
||||
```
|
||||
|
||||
**Docker Configuration:**
|
||||
- Self-contained with own `/app/packages`
|
||||
- No package sharing with other services (modular)
|
||||
- Port 8090 exposed
|
||||
|
||||
### Phase 1.5: Apache Airflow (Scheduled Ingestion)
|
||||
**Location:** `/home/sam/development/airflow/`
|
||||
|
||||
**Stack & Tools:**
|
||||
- **Orchestration:** Apache Airflow 2.8.1
|
||||
- **Executor:** CeleryExecutor (distributed task processing)
|
||||
- **Database:** PostgreSQL 13 (metadata)
|
||||
- **Message Queue:** Redis (Celery broker)
|
||||
- **Services:**
|
||||
- airflow-webserver (UI + API)
|
||||
- airflow-scheduler (DAG scheduling)
|
||||
- airflow-worker (task execution)
|
||||
- airflow-triggerer (deferrable operators)
|
||||
|
||||
**DAG: gitea_daily_ingestion**
|
||||
- **Schedule:** Daily
|
||||
- **Tasks:**
|
||||
1. `fetch_repos` - Get all user repos from Gitea API
|
||||
2. `fetch_readmes` - Download README files
|
||||
3. `ingest_to_chroma` - Store in Knowledge Service
|
||||
|
||||
**Integration:**
|
||||
- Mounts `knowledge_service/gitea_scraper.py` into DAGs folder
|
||||
- Environment variables for Gitea API token
|
||||
- Network: ai-mesh (communicates with knowledge-service)
|
||||
|
||||
### Phase 1.6: Gitea Scraper Module
|
||||
**Location:** `/home/sam/development/knowledge_service/gitea_scraper.py`
|
||||
|
||||
**Functionality:**
|
||||
- **API Integration:** Gitea REST API v1
|
||||
- **Authentication:** Token-based (Authorization header)
|
||||
- **Methods:**
|
||||
- `get_user_repos()` - Paginated repo listing
|
||||
- `get_readme(repo_name)` - README content with fallback names
|
||||
- `get_repo_files(repo_name, path)` - Directory listing
|
||||
- `get_file_content(repo_name, filepath)` - File download
|
||||
|
||||
**Data Model:**
|
||||
- `RepoMetadata` dataclass (name, description, url, branch, updated_at, language)
|
||||
|
||||
### Phase 1.7: Docker Infrastructure
|
||||
|
||||
**Network:**
|
||||
- `ai-mesh` (external) - Shared bridge network for all services
|
||||
|
||||
**Services Overview:**
|
||||
| Service | Port | Purpose | Dependencies |
|
||||
|---------|------|---------|--------------|
|
||||
| frontend | 5173 | React UI | backend |
|
||||
| backend | 8000 | Chat Gateway | langgraph-service, db |
|
||||
| db | 5432 | PostgreSQL (chat history) | - |
|
||||
| knowledge-service | 8080 | RAG / Vector DB | - |
|
||||
| langgraph-service | 8090 | Agent Orchestration | knowledge-service |
|
||||
| airflow-webserver | 8081 | Workflow UI | postgres, redis |
|
||||
| airflow-scheduler | - | DAG scheduling | postgres, redis |
|
||||
| airflow-worker | - | Task execution | postgres, redis |
|
||||
| redis | 6379 | Message broker | - |
|
||||
| postgres (airflow) | - | Airflow metadata | - |
|
||||
|
||||
**Container Patterns:**
|
||||
- All Python services use `/app/packages` + `/app/code` separation
|
||||
- Node.js services use volume mounting for hot reload
|
||||
- PostgreSQL uses named volumes for persistence
|
||||
- External network (`ai-mesh`) for cross-service communication
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Multi-Source Knowledge Ingestion (IN PROGRESS)
|
||||
|
||||
### Goal
|
||||
Expand beyond Gitea to ingest data from all self-hosted knowledge sources.
|
||||
|
||||
### Data Sources to Integrate:
|
||||
1. **Notes & Documentation**
|
||||
- **Trilium Next** - Hierarchical note-taking (tree structure)
|
||||
- **Obsidian** - Markdown vault with backlinks
|
||||
- **Flatnotes** - Flat file markdown notes
|
||||
- **HedgeDoc** - Collaborative markdown editor
|
||||
|
||||
2. **Wiki**
|
||||
- **DokuWiki** - Structured wiki content
|
||||
|
||||
3. **Project Management**
|
||||
- **Vikunja** - Task lists and project tracking
|
||||
|
||||
4. **Media & Assets**
|
||||
- **Immich** - Photo/video metadata + Gemini Vision API for content description
|
||||
- **HomeBox** - Physical inventory with images
|
||||
|
||||
### Technical Approach:
|
||||
- **Crawling:** Selenium/Playwright for JavaScript-heavy UIs
|
||||
- **Extraction:** Firecrawl or LangChain loaders for structured content
|
||||
- **Vision:** Gemini Vision API for image-to-text conversion
|
||||
- **Storage:** ChromaDB (vectors) + PostgreSQL (metadata, hashes for deduplication)
|
||||
- **Scheduling:** Additional Airflow DAGs per source
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Advanced Agent Capabilities
|
||||
|
||||
### Goal
|
||||
Integrate external AI tools and expand agent capabilities.
|
||||
|
||||
### Agent Tooling:
|
||||
1. **MCP (Model Context Protocol) Servers**
|
||||
- Git MCP - Local repository operations
|
||||
- Filesystem MCP - Secure file access
|
||||
- Memory MCP - Knowledge graph persistence
|
||||
- Custom Gitea MCP (if/when available)
|
||||
|
||||
2. **External Agents**
|
||||
- **Goose** - CLI-based agent for local task execution
|
||||
- **Aider** - AI pair programming
|
||||
- **Opencode** - Already integrated (Brain Agent)
|
||||
- **Automaker** - Workflow automation
|
||||
- **Autocoder** - Code generation
|
||||
|
||||
3. **Orchestration Tools**
|
||||
- **CAO CLI** - Agent orchestrator
|
||||
- **Agent Pipe** - Pipeline management
|
||||
|
||||
### Integration Pattern:
|
||||
- Each external tool wrapped as LangGraph node
|
||||
- Supervisor routes to appropriate specialist
|
||||
- State management for multi-turn interactions
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Production Hardening
|
||||
|
||||
### Goal
|
||||
Prepare system for production deployment.
|
||||
|
||||
### Authentication & Security:
|
||||
- **Laravel** - User authentication service (Phase 4 original plan)
|
||||
- **JWT tokens** - Session management
|
||||
- **API key management** - Secure credential storage
|
||||
- **Network policies** - Inter-service communication restrictions
|
||||
|
||||
### Monitoring & Observability:
|
||||
- **LangSmith** - LLM tracing and debugging
|
||||
- **Langfuse** - LLM observability (note: currently in per-project install list)
|
||||
- **Prometheus/Grafana** - Metrics and dashboards
|
||||
- **Airflow monitoring** - DAG success/failure alerting
|
||||
|
||||
### Scaling:
|
||||
- **ChromaDB** - Migration to server mode for concurrent access
|
||||
- **Airflow** - Multiple Celery workers
|
||||
- **Load balancing** - Nginx reverse proxy
|
||||
- **Backup strategies** - Vector DB snapshots, PostgreSQL dumps
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Workflow Automation & Visual Tools
|
||||
|
||||
### Goal
|
||||
Add visual prototyping and automation capabilities.
|
||||
|
||||
### Tools to Integrate:
|
||||
1. **Flowise** - Visual LangChain builder
|
||||
- Prototype agent flows without coding
|
||||
- Export to Python code
|
||||
- Debug RAG pipelines visually
|
||||
|
||||
2. **Windmill** - Turn scripts into workflows
|
||||
- Schedule Python/LangChain scripts
|
||||
- Reactive triggers (e.g., on-commit)
|
||||
- Low-code workflow builder
|
||||
|
||||
3. **Activepieces** - Event-driven automation
|
||||
- Webhook triggers from Gitea
|
||||
- Integration with external APIs
|
||||
- Visual workflow designer
|
||||
|
||||
4. **N8N** - Alternative workflow automation
|
||||
- Consider if Activepieces doesn't meet needs
|
||||
|
||||
### Use Cases:
|
||||
- **On-commit triggers:** Gitea push → immediate re-scan → notification
|
||||
- **Scheduled reports:** Weekly summary of new/updated projects
|
||||
- **Reactive workflows:** New photo uploaded → Gemini Vision → update knowledge base
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Knowledge Library Options & RAG Enhancement
|
||||
|
||||
### Goal
|
||||
Advanced retrieval and knowledge organization.
|
||||
|
||||
### RAG Pipeline Improvements:
|
||||
1. **Hybrid Search**
|
||||
- Semantic search (ChromaDB) + Keyword search (PostgreSQL)
|
||||
- Re-ranking with cross-encoders
|
||||
- Query expansion and decomposition
|
||||
|
||||
2. **Multi-Modal RAG**
|
||||
- Image retrieval (Immich + CLIP embeddings)
|
||||
- Document parsing (PDFs, code files)
|
||||
- Structured data (tables, lists)
|
||||
|
||||
3. **Knowledge Organization**
|
||||
- Entity extraction and linking
|
||||
- Knowledge graph construction
|
||||
- Hierarchical chunking strategies
|
||||
|
||||
### Alternative Vector Stores (Evaluation):
|
||||
- **pgvector** - PostgreSQL native (if ChromaDB limitations hit)
|
||||
- **Weaviate** - GraphQL interface, hybrid search
|
||||
- **Qdrant** - Rust-based, high performance
|
||||
- **Milvus** - Enterprise-grade, distributed
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: User Experience & Interface
|
||||
|
||||
### Goal
|
||||
Enhanced frontend and interaction patterns.
|
||||
|
||||
### Frontend Enhancements:
|
||||
1. **Chat Interface Improvements**
|
||||
- Streaming responses (Server-Sent Events)
|
||||
- Message threading and context
|
||||
- File upload for document ingestion
|
||||
- Image display (for Immich integration)
|
||||
|
||||
2. **Knowledge Browser**
|
||||
- View ingested documents
|
||||
- Search knowledge base directly
|
||||
- See confidence scores and sources
|
||||
- Manual document upload/ingestion trigger
|
||||
|
||||
3. **Agent Management**
|
||||
- View active agents
|
||||
- Configure agent behavior
|
||||
- Monitor agent performance
|
||||
- Override routing decisions
|
||||
|
||||
### Mobile & Accessibility:
|
||||
- Responsive design improvements
|
||||
- Mobile app (React Native or PWA)
|
||||
- Accessibility compliance (WCAG)
|
||||
|
||||
---
|
||||
|
||||
## Technology Stack Summary
|
||||
|
||||
### Core Frameworks:
|
||||
- **Backend:** FastAPI (Python 3.11)
|
||||
- **Frontend:** Vite + React 19 + TypeScript
|
||||
- **Styling:** Tailwind CSS
|
||||
- **Database:** PostgreSQL 15
|
||||
- **Vector DB:** ChromaDB 1.5.1
|
||||
|
||||
### AI/ML Stack:
|
||||
- **LLM Orchestration:** LangGraph 1.0.9 + LangChain
|
||||
- **Embeddings:** OpenAI via OpenRouter (text-embedding-3-small)
|
||||
- **LLM:** OpenCode Brain (opencode-brain:5000)
|
||||
- **Vision:** Gemini Vision API (Phase 2)
|
||||
|
||||
### Workflow & Scheduling:
|
||||
- **Orchestration:** Apache Airflow 2.8.1 (CeleryExecutor)
|
||||
- **Message Queue:** Redis
|
||||
- **External Tools:** Flowise, Windmill, Activepieces
|
||||
|
||||
### Development Tools:
|
||||
- **Containers:** Docker + Docker Compose
|
||||
- **Networking:** Bridge network (ai-mesh)
|
||||
- **Testing:** curl/httpx for API testing
|
||||
- **Version Control:** Gitea (self-hosted)
|
||||
|
||||
### Skills Demonstrated:
|
||||
- Containerized microservices architecture
|
||||
- Multi-agent AI orchestration (LangGraph)
|
||||
- Vector database implementation (RAG)
|
||||
- ETL pipeline development (Airflow)
|
||||
- API integration and web scraping
|
||||
- Modular, maintainable code organization
|
||||
- Industry-standard AI tooling (LangChain ecosystem)
|
||||
- Workflow automation and scheduling
|
||||
Reference in New Issue
Block a user