Local Hybrid Vector + Graph RAG Setup via Caddy & Docker

This document outlines the architecture and configuration files required to run a single, unified local RAG system (Vector search for static files + Graph search for Obsidian notes) served inside an iframe across three separate context-specific showcase websites (devops.local, coding.local, ai.local).

1. Network Routing (`Caddyfile`)

This configuration uses a Caddyfile snippets to proxy your backend container while securely handling cross-origin iframe security rules (Content-Security-Policy).

# Core AI RAG Application Backend
ai.local {
    reverse_proxy localhost:8000

    header {
        # Restrict iframe rendering specifically to your 3 interest domains
        Content-Security-Policy "frame-ancestors 'self' https://devops.local https://coding.local https://ai-site.local"
       
        # Standard security hardening
        X-Content-Type-Options "nosniff"
        Referrer-Policy "strict-origin-when-cross-origin"
    }
}

# Example Configuration Blocks for Frontend Sites
devops.local {
    root * /var/www/devops_site
    file_server
}

coding.local {
    root * /var/www/coding_site
    file_server
}

ai-site.local {
    root * /var/www/ai_site
    file_server
}

2. Infrastructure Layer (`docker-compose.yml`)

The app runs out of a localized, slimmed-down Python environment container. Underlying vector files and Graph databases are explicitly mounted as read-only (:ro) to guarantee stability against prompt manipulation.

version: '3.8'

services:
  unified-ai-rag:
    image: python:3.11-slim
    container_name: local_ai_rag
    working_dir: /app
    volumes:
      # Mount application scripts
      - ./app:/app
      # Mount databases and notes securely as READ-ONLY
      - ./db/semantic_rag:/app/db/semantic_rag:ro
      - ./db/obsidian_graph.gpickle:/app/db/obsidian_graph.gpickle:ro
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=your_openai_api_key_here
      - ANTHROPIC_API_KEY=your_anthropic_api_key_here
    command: ["pip", "install", "-r", "requirements.txt", "&&", "chainlit", "run", "app.py", "--port", "8000"]
    restart: unless-stopped

3. Web UI Application & Context Controller (`app.py`)

This Chainlit-driven script reads incoming referrers or basic text prompts to shift its personality depending on which website the recruiter came from.

import chainlit as cl
import networkx as nx
from chromadb import PersistentClient

# 1. Initialization hooks for Vector and Graph layers (Read-Only)
def load_databases():
    vector_client = PersistentClient(path="/app/db/semantic_rag")
    vector_layer = vector_client.get_collection(name="resume_and_docs")
    graph_layer = nx.read_gpickle("/app/db/obsidian_graph.gpickle")
    return vector_layer, graph_layer

vector_db, graph_db = load_databases()

@cl.on_chat_start
async def start():
    # Detect the site referring the frame to adjust persona
    http_headers = cl.user_session.get("http_headers", {})
    referer = http_headers.get("referer", "")
   
    system_prompt = "You are a helpful AI assistant reviewing my portfolio data."
    welcome_msg = "Hello! Ask me any questions about my profile or experience."
   
    if "devops.local" in referer:
        system_prompt = "Persona: DevOps Engineer. Focus heavily on infrastructure, IoT architecture, CI/CD, and server logs."
        welcome_msg = "Welcome Recruiter! Ask me anything about my DevOps automation and IoT infrastructure."
    elif "coding.local" in referer:
        system_prompt = "Persona: Software Engineer. Emphasize backend code, software design patterns, and clean programming methodologies."
        welcome_msg = "Hello! Let's talk about my development portfolio and technical code paradigms."
    elif "ai-site.local" in referer:
        system_prompt = "Persona: AI/RAG Specialist. Discuss custom embedding techniques, semantic lookups, and graph networks."
        welcome_msg = "Greetings! Feel free to pick my brain about graph-based indexing and large language models."
       
    cl.user_session.set("system_prompt", system_prompt)
    await cl.Message(content=welcome_msg).send()

@cl.on_message
async def main(message: cl.Message):
    query = message.content
    sys_prompt = cl.user_session.get("system_prompt")
   
    # Executing the dual pass strategy
    # (Extract semantic chunks + pull Obsidian link neighbors from the graph)
    # Synthesize outputs down through the LLM context window here...
   
    response_text = f"Processed query using profile persona contextualized framework rules."
    await cl.Message(content=response_text).send()

4. Frontend Integration (`iframe`)

Embed this string block directly inside your target HTML sites:

<iframe
  src="https://ai.local"
  style="width: 100%; height: 650px; border: 1px solid #ccc; border-radius: 8px;"
  allow="clipboard-read; clipboard-write">
</iframe>

5.0 KiB Raw Blame History

1778914902-WMFA

Local Hybrid Vector + Graph RAG Setup via Caddy & Docker

1. Network Routing (Caddyfile)

2. Infrastructure Layer (docker-compose.yml)

3. Web UI Application & Context Controller (app.py)

4. Frontend Integration (iframe)

5.0 KiB

Raw Blame History

1. Network Routing (`Caddyfile`)

2. Infrastructure Layer (`docker-compose.yml`)

3. Web UI Application & Context Controller (`app.py`)

4. Frontend Integration (`iframe`)