docs: update smart-router section (thinking levels, headroom integration), add full Headroom documentation

This commit is contained in:
2026-06-10 21:00:56 +10:00
parent 3367e32c27
commit 0e5bb0719d

View File

@@ -37,7 +37,7 @@ aliases: []
| **plannotator** | `~/.agents` | Interactive plan review with browser UI, annotations, code review | | **plannotator** | `~/.agents` | Interactive plan review with browser UI, annotations, code review |
| **caveman** | `~/.agents` | Ultra-compressed communication mode | | **caveman** | `~/.agents` | Ultra-compressed communication mode |
| **markitdown** | `~/.agents` | Convert files (PDF, Word, Excel, PPTX, images, HTML, etc.) to Markdown. Image analysis via Qwen 2.5 VL 72B on OpenRouter. | | **markitdown** | `~/.agents` | Convert files (PDF, Word, Excel, PPTX, images, HTML, etc.) to Markdown. Image analysis via Qwen 2.5 VL 72B on OpenRouter. |
| **smart-router** | `~/.agents` | Dynamic prompt routing — analyzes intent and routes to optimal model. `/lock-model` and `/unlock-model` for manual override. | | **smart-router** | `~/.agents` | Dynamic prompt routing + thinking levels — analyzes intent, routes to optimal model, sets thinking level per task complexity. `/lock-model` and `/unlock-model` for manual override. Triggers Headroom compression for heavy analysis/code/devops contexts. |
--- ---
@@ -161,30 +161,41 @@ The **npm-security** skill instructs the Pi agent to follow this workflow before
## Smart Router ## Smart Router
The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt interceptor** that analyzes every incoming prompt and dynamically routes it to the most appropriate model based on intent. It replaces the old `/do-*` prompt templates with automatic, invisible routing. The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt interceptor** that analyzes every incoming prompt and dynamically routes it to the most appropriate model based on intent. It also sets thinking levels and triggers Headroom context compression for heavy workloads.
### How it works ### How it works
1. Every prompt is intercepted before the agent loop starts 1. Every prompt is intercepted before the agent loop starts
2. A free model (`openrouter/owl-alpha`) analyzes the intent 2. A fast model (`openrouter/free`) analyzes the intent
3. The prompt is classified into one of 10 tags (read, discuss, search, devops-low, devops-high, code-analysis-low, code-analysis-high, codewrite-low, codewrite-high) 3. The prompt is classified into tags (read, discuss, search, devops-low, devops-high, code-analysis-low, code-analysis-high, codewrite-low, codewrite-high)
4. The router selects the optimal model based on tag + language 4. The router selects the optimal model + thinking level based on tag
5. The selected model is set via `pi.setModel()` 5. For analysis/code/devops tags, large contexts (>5K tokens) are compressed via Headroom before reaching the LLM
6. Routing decisions appear in the **footer status bar** (e.g. `🎯 devops-low → qwen/qwen3.6-flash`) 6. Routing + compression status appears in the **footer status bar** (e.g. `🎯 devops-low → opencode-go/deepseek-v4-flash`, `📦 73%`)
### Routing table ### Routing table
| Tag | Model | Use case | | Tag | Route Key | Model | Thinking | Use case |
|-----|-------|----------| |-----|-----------|-------|----------|----------|
| `read`, `discuss`, `search` | `openrouter/owl-alpha` | Reading docs, general chat, web search | | `read`, `discuss`, `search` | `free-core` | `openrouter/free` | — | Reading docs, chat, web search |
| `devops-low` | `qwen/qwen3.6-flash` | Simple YAML, Docker, bash | | `devops-low` | `economy-devops` | `opencode-go/deepseek-v4-flash` | — | Simple YAML, Docker, bash |
| `devops-high` | `qwen/qwen-2.5-72b-instruct` | Complex multi-container, server crashes | | `devops-high` | `precision-devops` | `openrouter/deepseek-v4-pro` | `medium` | Complex multi-container, server crashes |
| `code-analysis-low` | `openrouter/owl-alpha` | Finding bugs in short files | | `code-analysis-low` | `free-core` | `openrouter/free` | — | Finding bugs in short files |
| `code-analysis-high` | `moonshotai/kimi-k2.6` | Refactoring large codebases (262K context) | | `code-analysis-high` | `context-heavy` | `openrouter/moonshotai/kimi-k2.6` | — | Refactoring large codebases (262K context) |
| `codewrite-low` | `deepseek/deepseek-v4-flash` | Boilerplate, simple functions | | `codewrite-low` | `economy-code` | `opencode-go/deepseek-v4-pro` | `low` | Boilerplate, simple functions |
| `codewrite-high` (React) | `qwen/qwen3-coder-plus` | Complex React/JS | | `codewrite-high` (React) | `precision-react` | `openrouter/deepseek-v4-pro` | `high` | Complex React/JS |
| `codewrite-high` (other) | `deepseek/deepseek-v4-pro` | Complex PHP, dense logic | | `codewrite-high` (other) | `precision-code-high` | `openrouter/deepseek-v4-pro` | `high` | Complex code, dense logic |
| Short prompts (<15 chars) | `openrouter/owl-alpha` | Quick responses, greetings | | Short prompts (<15 chars) | `router-eval` | `openrouter/free` | — | Quick responses |
### Thinking levels
Set by the smart-router based on task complexity:
| Thinking Level | Tags | Effect |
|---|---|---|
| (unchanged) | `read`, `discuss`, `search`, `devops-low`, `code-analysis-low`, `code-analysis-high` | Fast responses, no extended reasoning |
| `low` | `codewrite-low` | Brief reasoning to avoid silly mistakes in boilerplate |
| `medium` | `devops-high` | Balanced reasoning for complex infrastructure |
| `high` | `codewrite-high`, `precision-react` | Full reasoning for complex code — worth the latency |
### Manual override ### Manual override
@@ -196,7 +207,7 @@ The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt
/unlock-model /unlock-model
``` ```
When locked, all prompts go directly to the specified model until `/unlock-model` is called. **Note on model ID format:** `/lock-model` takes `provider/model-id`. Some OpenRouter models include the provider prefix in their ID (e.g. `openrouter/owl-alpha` has `id = "openrouter/owl-alpha"`). The handler tries both `id` and `provider/id` to find the model.
### What about `/do-*` commands? ### What about `/do-*` commands?
@@ -204,14 +215,81 @@ The `/do-*` prompt templates are **no longer needed** for model selection. The s
### Configuration ### Configuration
Model mappings are defined in `~/.agents/extensions/smart-router/index.ts`. To change which model a tag routes to, edit the `MODELS` table and run `/reload`. Model mappings are defined in `~/.agents/extensions/smart-router/index.ts`. To change routing, thinking, or compression behavior, edit the `MODELS`, `THINKING`, or `COMPRESS_TAGS` tables and run `/reload`.
### Files ### Files
| File | Purpose | | File | Purpose |
|------|---------| |------|---------|
| `~/.agents/extensions/smart-router/index.ts` | Extension source | | `~/.agents/extensions/smart-router/index.ts` | Extension source |
| `~/.pi/agent/extensions/smart-router/index.ts` | Synced copy (Gitea backup) | | `~/.agents/extensions/smart-router/package.json` | npm deps (headroom-ai) |
| `~/.pi/agent/extensions/smart-router/index.ts` | Synced copy |
---
## Headroom
Headroom is a **context compression layer** that reduces prompt token usage by 60-95% for heavy analysis/code/devops workloads. It runs as a Docker container on the server (192.168.20.13) and is selectively triggered by the smart-router.
### How it works
1. Smart-router analyzes the prompt → determines if compression is needed
2. Tags `read`, `discuss`, `search` **never** trigger compression (these are fast paths)
3. For all other tags, if accumulated context exceeds ~5K tokens, the smart-router calls `compress()`
4. Messages are sent to the Headroom proxy at `192.168.20.13:8787`
5. Headroom compresses the context (using SmartCrusher for JSON, CodeCompressor for AST, Kompress-base ML for text)
6. Compressed messages are returned and forwarded to the LLM
7. If the proxy is down, messages pass through unchanged (graceful fallback)
### Architecture
```
Desktop (.27) Server (.13)
───────────── ────────────
smart-router analyzes prompt headroom proxy (Docker)
│ │
│ if compress needed: │
│ compress(messages) ──────────────► │
│ HTTP POST 192.168.20.13:8787 │
│ ◄────────── compressed messages │
│ │
│ send to LLM │
│ │
│ if proxy down: pass through │
```
### Compression thresholds
| Condition | Action |
|---|---|
| Tag is `read`/`discuss`/`search` | Skip — no compression |
| Context < 5K tokens | Skip — too small to benefit |
| Context ≥ 5K tokens + analysis/code/devops tag | Compress |
| Proxy unreachable | Pass through unchanged |
### Management
```bash
# Check status
ssh 192.168.20.13 "docker ps --filter name=headroom"
# View logs
ssh 192.168.20.13 "docker logs --tail 20 headroom"
# Restart
ssh 192.168.20.13 "docker restart headroom"
# Update image
ssh 192.168.20.13 "cd /home/sam/Docker/Containers/headroom && docker compose pull && docker compose up -d"
```
### Files
| File | Purpose |
|------|---------|
| `~/.agents/extensions/smart-router/index.ts` | Compression logic (`compress()` import + `context` event handler) |
| `/home/sam/Docker/Containers/headroom/docker-compose.yml` | Docker service definition (on .13) |
| `/home/sam/Docker/Containers/headroom/.env` | Environment file (on .13) |
--- ---
@@ -307,4 +385,5 @@ pi-mcp-adapter connects Pi to external services via the Model Context Protocol.
- [ ] Verify video-extract works with Gemini - [ ] Verify video-extract works with Gemini
- [x] Add markitdown skill to Obsidian skills page - [x] Add markitdown skill to Obsidian skills page
- [x] Add smart-router extension and update Obsidian docs - [x] Add smart-router extension and update Obsidian docs
- [x] Deploy Headroom Docker compression on .13, integrate with smart-router
- [ ] Clean up workspace-map.json entries for any stale memory packs - [ ] Clean up workspace-map.json entries for any stale memory packs