docs: update smart-router section (thinking levels, headroom integration), add full Headroom documentation

2026-06-10 21:00:56 +10:00
parent 3367e32c27
commit 0e5bb0719d
1 changed files with 100 additions and 21 deletions
--- a/Skills.md
+++ b/Skills.md
@@ -37,7 +37,7 @@ aliases: []
 | **plannotator** | `~/.agents` | Interactive plan review with browser UI, annotations, code review |
 | **caveman** | `~/.agents` | Ultra-compressed communication mode |
 | **markitdown** | `~/.agents` | Convert files (PDF, Word, Excel, PPTX, images, HTML, etc.) to Markdown. Image analysis via Qwen 2.5 VL 72B on OpenRouter. |
-| **smart-router** | `~/.agents` | Dynamic prompt routing — analyzes intent and routes to optimal model. `/lock-model` and `/unlock-model` for manual override. |
+| **smart-router** | `~/.agents` | Dynamic prompt routing + thinking levels — analyzes intent, routes to optimal model, sets thinking level per task complexity. `/lock-model` and `/unlock-model` for manual override. Triggers Headroom compression for heavy analysis/code/devops contexts. |
 ---
@@ -161,30 +161,41 @@ The **npm-security** skill instructs the Pi agent to follow this workflow before
 ## Smart Router
-The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt interceptor** that analyzes every incoming prompt and dynamically routes it to the most appropriate model based on intent. It replaces the old `/do-*` prompt templates with automatic, invisible routing.
+The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt interceptor** that analyzes every incoming prompt and dynamically routes it to the most appropriate model based on intent. It also sets thinking levels and triggers Headroom context compression for heavy workloads.
 ### How it works
 1. Every prompt is intercepted before the agent loop starts
-2. A free model (`openrouter/owl-alpha`) analyzes the intent
+2. A fast model (`openrouter/free`) analyzes the intent
-3. The prompt is classified into one of 10 tags (read, discuss, search, devops-low, devops-high, code-analysis-low, code-analysis-high, codewrite-low, codewrite-high)
+3. The prompt is classified into tags (read, discuss, search, devops-low, devops-high, code-analysis-low, code-analysis-high, codewrite-low, codewrite-high)
-4. The router selects the optimal model based on tag + language
+4. The router selects the optimal model + thinking level based on tag
-5. The selected model is set via `pi.setModel()`
+5. For analysis/code/devops tags, large contexts (>5K tokens) are compressed via Headroom before reaching the LLM
-6. Routing decisions appear in the **footer status bar** (e.g. `🎯 devops-low → qwen/qwen3.6-flash`)
+6. Routing + compression status appears in the **footer status bar** (e.g. `🎯 devops-low → opencode-go/deepseek-v4-flash`, `📦 73%`)
 ### Routing table
-| Tag | Model | Use case |
+| Tag | Route Key | Model | Thinking | Use case |
-|-----|-------|----------|
+|-----|-----------|-------|----------|----------|
-| `read`, `discuss`, `search` | `openrouter/owl-alpha` | Reading docs, general chat, web search |
+| `read`, `discuss`, `search` | `free-core` | `openrouter/free` | — | Reading docs, chat, web search |
-| `devops-low` | `qwen/qwen3.6-flash` | Simple YAML, Docker, bash |
+| `devops-low` | `economy-devops` | `opencode-go/deepseek-v4-flash` | — | Simple YAML, Docker, bash |
-| `devops-high` | `qwen/qwen-2.5-72b-instruct` | Complex multi-container, server crashes |
+| `devops-high` | `precision-devops` | `openrouter/deepseek-v4-pro` | `medium` | Complex multi-container, server crashes |
-| `code-analysis-low` | `openrouter/owl-alpha` | Finding bugs in short files |
+| `code-analysis-low` | `free-core` | `openrouter/free` | — | Finding bugs in short files |
-| `code-analysis-high` | `moonshotai/kimi-k2.6` | Refactoring large codebases (262K context) |
+| `code-analysis-high` | `context-heavy` | `openrouter/moonshotai/kimi-k2.6` | — | Refactoring large codebases (262K context) |
-| `codewrite-low` | `deepseek/deepseek-v4-flash` | Boilerplate, simple functions |
+| `codewrite-low` | `economy-code` | `opencode-go/deepseek-v4-pro` | `low` | Boilerplate, simple functions |
-| `codewrite-high` (React) | `qwen/qwen3-coder-plus` | Complex React/JS |
+| `codewrite-high` (React) | `precision-react` | `openrouter/deepseek-v4-pro` | `high` | Complex React/JS |
-| `codewrite-high` (other) | `deepseek/deepseek-v4-pro` | Complex PHP, dense logic |
+| `codewrite-high` (other) | `precision-code-high` | `openrouter/deepseek-v4-pro` | `high` | Complex code, dense logic |
-| Short prompts (<15 chars) | `openrouter/owl-alpha` | Quick responses, greetings |
+| Short prompts (<15 chars) | `router-eval` | `openrouter/free` | — | Quick responses |
 ### Thinking levels
 Set by the smart-router based on task complexity:
 | Thinking Level | Tags | Effect |
 |---|---|---|
 | (unchanged) | `read`, `discuss`, `search`, `devops-low`, `code-analysis-low`, `code-analysis-high` | Fast responses, no extended reasoning |
 | `low` | `codewrite-low` | Brief reasoning to avoid silly mistakes in boilerplate |
 | `medium` | `devops-high` | Balanced reasoning for complex infrastructure |
 | `high` | `codewrite-high`, `precision-react` | Full reasoning for complex code — worth the latency |
 ### Manual override
@@ -196,7 +207,7 @@ The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt
 /unlock-model
 ```
-When locked, all prompts go directly to the specified model until `/unlock-model` is called.
+**Note on model ID format:** `/lock-model` takes `provider/model-id`. Some OpenRouter models include the provider prefix in their ID (e.g. `openrouter/owl-alpha` has `id = "openrouter/owl-alpha"`). The handler tries both `id` and `provider/id` to find the model.
 ### What about `/do-*` commands?
@@ -204,14 +215,81 @@ The `/do-*` prompt templates are **no longer needed** for model selection. The s
 ### Configuration
-Model mappings are defined in `~/.agents/extensions/smart-router/index.ts`. To change which model a tag routes to, edit the `MODELS` table and run `/reload`.
+Model mappings are defined in `~/.agents/extensions/smart-router/index.ts`. To change routing, thinking, or compression behavior, edit the `MODELS`, `THINKING`, or `COMPRESS_TAGS` tables and run `/reload`.
 ### Files
 | File | Purpose |
 |------|---------|
 | `~/.agents/extensions/smart-router/index.ts` | Extension source |
-| `~/.pi/agent/extensions/smart-router/index.ts` | Synced copy (Gitea backup) |
+| `~/.agents/extensions/smart-router/package.json` | npm deps (headroom-ai) |
 | `~/.pi/agent/extensions/smart-router/index.ts` | Synced copy |
 ---
 ## Headroom
 Headroom is a **context compression layer** that reduces prompt token usage by 60-95% for heavy analysis/code/devops workloads. It runs as a Docker container on the server (192.168.20.13) and is selectively triggered by the smart-router.
 ### How it works
 1. Smart-router analyzes the prompt → determines if compression is needed
 2. Tags `read`, `discuss`, `search` **never** trigger compression (these are fast paths)
 3. For all other tags, if accumulated context exceeds ~5K tokens, the smart-router calls `compress()`
 4. Messages are sent to the Headroom proxy at `192.168.20.13:8787`
 5. Headroom compresses the context (using SmartCrusher for JSON, CodeCompressor for AST, Kompress-base ML for text)
 6. Compressed messages are returned and forwarded to the LLM
 7. If the proxy is down, messages pass through unchanged (graceful fallback)
 ### Architecture
 ```
 Desktop (.27)                         Server (.13)
 ─────────────                         ────────────
 smart-router analyzes prompt          headroom proxy (Docker)
    │                                     │
    │  if compress needed:                │
    │  compress(messages) ──────────────► │
    │  HTTP POST 192.168.20.13:8787       │
    │  ◄────────── compressed messages    │
    │                                     │
    │  send to LLM                        │
    │                                     │
    │  if proxy down: pass through        │
 ```
 ### Compression thresholds
 | Condition | Action |
 |---|---|
 | Tag is `read`/`discuss`/`search` | Skip — no compression |
 | Context < 5K tokens | Skip — too small to benefit |
 | Context ≥ 5K tokens + analysis/code/devops tag | Compress |
 | Proxy unreachable | Pass through unchanged |
 ### Management
 ```bash
 # Check status
 ssh 192.168.20.13 "docker ps --filter name=headroom"
 # View logs
 ssh 192.168.20.13 "docker logs --tail 20 headroom"
 # Restart
 ssh 192.168.20.13 "docker restart headroom"
 # Update image
 ssh 192.168.20.13 "cd /home/sam/Docker/Containers/headroom && docker compose pull && docker compose up -d"
 ```
 ### Files
 | File | Purpose |
 |------|---------|
 | `~/.agents/extensions/smart-router/index.ts` | Compression logic (`compress()` import + `context` event handler) |
 | `/home/sam/Docker/Containers/headroom/docker-compose.yml` | Docker service definition (on .13) |
 | `/home/sam/Docker/Containers/headroom/.env` | Environment file (on .13) |
 ---
@@ -307,4 +385,5 @@ pi-mcp-adapter connects Pi to external services via the Model Context Protocol.
 - [ ] Verify video-extract works with Gemini
 - [x] Add markitdown skill to Obsidian skills page
 - [x] Add smart-router extension and update Obsidian docs
 - [x] Deploy Headroom Docker compression on .13, integrate with smart-router
 - [ ] Clean up workspace-map.json entries for any stale memory packs