obsidian-vault/300 areas/350 AI/Local Hybrid Vector + Graph RAG Setup.md

---
created: 2026-05-16 17:02
modified: 2026-05-16 17:02
type: note
tags:
  - ai
  - dev-ops
  - website
  - iframe
aliases: []
id: 1778914902-WMFA
---
# [[Local Hybrid Vector + Graph RAG Setup]]

# Local Hybrid Vector + Graph RAG Setup via Caddy & Docker

Taken from a Google Gemini AI chat.

This document outlines the architecture and configuration files required to run a single, unified local RAG system (Vector search for static files + Graph search for Obsidian notes) served inside an iframe across three separate context-specific showcase websites (`devops.local`, `coding.local`, `ai.local`).

---

## 1. Network Routing (`Caddyfile`)

This configuration uses a `Caddyfile` snippets to proxy your backend container while securely handling cross-origin iframe security rules (`Content-Security-Policy`).

```caddy
# Core AI RAG Application Backend
ai.local {
    reverse_proxy localhost:8000

    header {
        # Restrict iframe rendering specifically to your 3 interest domains
        Content-Security-Policy "frame-ancestors 'self' https://devops.local https://coding.local https://ai-site.local"

        # Standard security hardening
        X-Content-Type-Options "nosniff"
        Referrer-Policy "strict-origin-when-cross-origin"
    }
}

# Example Configuration Blocks for Frontend Sites
devops.local {
    root * /var/www/devops_site
    file_server
}

coding.local {
    root * /var/www/coding_site
    file_server
}

ai-site.local {
    root * /var/www/ai_site
    file_server
}
```

---

## 2. Infrastructure Layer (`docker-compose.yml`)

The app runs out of a localized, slimmed-down Python environment container. Underlying vector files and Graph databases are explicitly mounted as **read-only** (`:ro`) to guarantee stability against prompt manipulation.

```yaml
version: '3.8'

services:
  unified-ai-rag:
    image: python:3.11-slim
    container_name: local_ai_rag
    working_dir: /app
    volumes:
      # Mount application scripts
      - ./app:/app
      # Mount databases and notes securely as READ-ONLY
      - ./db/semantic_rag:/app/db/semantic_rag:ro
      - ./db/obsidian_graph.gpickle:/app/db/obsidian_graph.gpickle:ro
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=your_openai_api_key_here
      - ANTHROPIC_API_KEY=your_anthropic_api_key_here
    command: ["pip", "install", "-r", "requirements.txt", "&&", "chainlit", "run", "app.py", "--port", "8000"]
    restart: unless-stopped
```

---

## 3. Web UI Application & Context Controller (`app.py`)

This Chainlit-driven script reads incoming referrers or basic text prompts to shift its personality depending on which website the recruiter came from.

```python
import chainlit as cl
import networkx as nx
from chromadb import PersistentClient

# 1. Initialization hooks for Vector and Graph layers (Read-Only)
def load_databases():
    vector_client = PersistentClient(path="/app/db/semantic_rag")
    vector_layer = vector_client.get_collection(name="resume_and_docs")
    graph_layer = nx.read_gpickle("/app/db/obsidian_graph.gpickle")
    return vector_layer, graph_layer

vector_db, graph_db = load_databases()

@cl.on_chat_start
async def start():
    # Detect the site referring the frame to adjust persona
    http_headers = cl.user_session.get("http_headers", {})
    referer = http_headers.get("referer", "")

    system_prompt = "You are a helpful AI assistant reviewing my portfolio data."
    welcome_msg = "Hello! Ask me any questions about my profile or experience."

    if "devops.local" in referer:
        system_prompt = "Persona: DevOps Engineer. Focus heavily on infrastructure, IoT architecture, CI/CD, and server logs."
        welcome_msg = "Welcome Recruiter! Ask me anything about my DevOps automation and IoT infrastructure."
    elif "coding.local" in referer:
        system_prompt = "Persona: Software Engineer. Emphasize backend code, software design patterns, and clean programming methodologies."
        welcome_msg = "Hello! Let's talk about my development portfolio and technical code paradigms."
    elif "ai-site.local" in referer:
        system_prompt = "Persona: AI/RAG Specialist. Discuss custom embedding techniques, semantic lookups, and graph networks."
        welcome_msg = "Greetings! Feel free to pick my brain about graph-based indexing and large language models."

    cl.user_session.set("system_prompt", system_prompt)
    await cl.Message(content=welcome_msg).send()

@cl.on_message
async def main(message: cl.Message):
    query = message.content
    sys_prompt = cl.user_session.get("system_prompt")

    # Executing the dual pass strategy
    # (Extract semantic chunks + pull Obsidian link neighbors from the graph)
    # Synthesize outputs down through the LLM context window here...

    response_text = f"Processed query using profile persona contextualized framework rules."
    await cl.Message(content=response_text).send()
```

---

## 4. Frontend Integration (`iframe`)

Embed this string block directly inside your target HTML sites:

```html
<iframe
  src="https://ai.local"
  style="width: 100%; height: 650px; border: 1px solid #ccc; border-radius: 8px;"
  allow="clipboard-read; clipboard-write">
</iframe>
```