docs: update smart-router section (thinking levels, headroom integration), add full Headroom documentation
This commit is contained in:
@@ -37,7 +37,7 @@ aliases: []
|
|||||||
| **plannotator** | `~/.agents` | Interactive plan review with browser UI, annotations, code review |
|
| **plannotator** | `~/.agents` | Interactive plan review with browser UI, annotations, code review |
|
||||||
| **caveman** | `~/.agents` | Ultra-compressed communication mode |
|
| **caveman** | `~/.agents` | Ultra-compressed communication mode |
|
||||||
| **markitdown** | `~/.agents` | Convert files (PDF, Word, Excel, PPTX, images, HTML, etc.) to Markdown. Image analysis via Qwen 2.5 VL 72B on OpenRouter. |
|
| **markitdown** | `~/.agents` | Convert files (PDF, Word, Excel, PPTX, images, HTML, etc.) to Markdown. Image analysis via Qwen 2.5 VL 72B on OpenRouter. |
|
||||||
| **smart-router** | `~/.agents` | Dynamic prompt routing — analyzes intent and routes to optimal model. `/lock-model` and `/unlock-model` for manual override. |
|
| **smart-router** | `~/.agents` | Dynamic prompt routing + thinking levels — analyzes intent, routes to optimal model, sets thinking level per task complexity. `/lock-model` and `/unlock-model` for manual override. Triggers Headroom compression for heavy analysis/code/devops contexts. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -161,30 +161,41 @@ The **npm-security** skill instructs the Pi agent to follow this workflow before
|
|||||||
|
|
||||||
## Smart Router
|
## Smart Router
|
||||||
|
|
||||||
The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt interceptor** that analyzes every incoming prompt and dynamically routes it to the most appropriate model based on intent. It replaces the old `/do-*` prompt templates with automatic, invisible routing.
|
The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt interceptor** that analyzes every incoming prompt and dynamically routes it to the most appropriate model based on intent. It also sets thinking levels and triggers Headroom context compression for heavy workloads.
|
||||||
|
|
||||||
### How it works
|
### How it works
|
||||||
|
|
||||||
1. Every prompt is intercepted before the agent loop starts
|
1. Every prompt is intercepted before the agent loop starts
|
||||||
2. A free model (`openrouter/owl-alpha`) analyzes the intent
|
2. A fast model (`openrouter/free`) analyzes the intent
|
||||||
3. The prompt is classified into one of 10 tags (read, discuss, search, devops-low, devops-high, code-analysis-low, code-analysis-high, codewrite-low, codewrite-high)
|
3. The prompt is classified into tags (read, discuss, search, devops-low, devops-high, code-analysis-low, code-analysis-high, codewrite-low, codewrite-high)
|
||||||
4. The router selects the optimal model based on tag + language
|
4. The router selects the optimal model + thinking level based on tag
|
||||||
5. The selected model is set via `pi.setModel()`
|
5. For analysis/code/devops tags, large contexts (>5K tokens) are compressed via Headroom before reaching the LLM
|
||||||
6. Routing decisions appear in the **footer status bar** (e.g. `🎯 devops-low → qwen/qwen3.6-flash`)
|
6. Routing + compression status appears in the **footer status bar** (e.g. `🎯 devops-low → opencode-go/deepseek-v4-flash`, `📦 73%`)
|
||||||
|
|
||||||
### Routing table
|
### Routing table
|
||||||
|
|
||||||
| Tag | Model | Use case |
|
| Tag | Route Key | Model | Thinking | Use case |
|
||||||
|-----|-------|----------|
|
|-----|-----------|-------|----------|----------|
|
||||||
| `read`, `discuss`, `search` | `openrouter/owl-alpha` | Reading docs, general chat, web search |
|
| `read`, `discuss`, `search` | `free-core` | `openrouter/free` | — | Reading docs, chat, web search |
|
||||||
| `devops-low` | `qwen/qwen3.6-flash` | Simple YAML, Docker, bash |
|
| `devops-low` | `economy-devops` | `opencode-go/deepseek-v4-flash` | — | Simple YAML, Docker, bash |
|
||||||
| `devops-high` | `qwen/qwen-2.5-72b-instruct` | Complex multi-container, server crashes |
|
| `devops-high` | `precision-devops` | `openrouter/deepseek-v4-pro` | `medium` | Complex multi-container, server crashes |
|
||||||
| `code-analysis-low` | `openrouter/owl-alpha` | Finding bugs in short files |
|
| `code-analysis-low` | `free-core` | `openrouter/free` | — | Finding bugs in short files |
|
||||||
| `code-analysis-high` | `moonshotai/kimi-k2.6` | Refactoring large codebases (262K context) |
|
| `code-analysis-high` | `context-heavy` | `openrouter/moonshotai/kimi-k2.6` | — | Refactoring large codebases (262K context) |
|
||||||
| `codewrite-low` | `deepseek/deepseek-v4-flash` | Boilerplate, simple functions |
|
| `codewrite-low` | `economy-code` | `opencode-go/deepseek-v4-pro` | `low` | Boilerplate, simple functions |
|
||||||
| `codewrite-high` (React) | `qwen/qwen3-coder-plus` | Complex React/JS |
|
| `codewrite-high` (React) | `precision-react` | `openrouter/deepseek-v4-pro` | `high` | Complex React/JS |
|
||||||
| `codewrite-high` (other) | `deepseek/deepseek-v4-pro` | Complex PHP, dense logic |
|
| `codewrite-high` (other) | `precision-code-high` | `openrouter/deepseek-v4-pro` | `high` | Complex code, dense logic |
|
||||||
| Short prompts (<15 chars) | `openrouter/owl-alpha` | Quick responses, greetings |
|
| Short prompts (<15 chars) | `router-eval` | `openrouter/free` | — | Quick responses |
|
||||||
|
|
||||||
|
### Thinking levels
|
||||||
|
|
||||||
|
Set by the smart-router based on task complexity:
|
||||||
|
|
||||||
|
| Thinking Level | Tags | Effect |
|
||||||
|
|---|---|---|
|
||||||
|
| (unchanged) | `read`, `discuss`, `search`, `devops-low`, `code-analysis-low`, `code-analysis-high` | Fast responses, no extended reasoning |
|
||||||
|
| `low` | `codewrite-low` | Brief reasoning to avoid silly mistakes in boilerplate |
|
||||||
|
| `medium` | `devops-high` | Balanced reasoning for complex infrastructure |
|
||||||
|
| `high` | `codewrite-high`, `precision-react` | Full reasoning for complex code — worth the latency |
|
||||||
|
|
||||||
### Manual override
|
### Manual override
|
||||||
|
|
||||||
@@ -196,7 +207,7 @@ The smart-router extension (`~/.agents/extensions/smart-router/`) is a **prompt
|
|||||||
/unlock-model
|
/unlock-model
|
||||||
```
|
```
|
||||||
|
|
||||||
When locked, all prompts go directly to the specified model until `/unlock-model` is called.
|
**Note on model ID format:** `/lock-model` takes `provider/model-id`. Some OpenRouter models include the provider prefix in their ID (e.g. `openrouter/owl-alpha` has `id = "openrouter/owl-alpha"`). The handler tries both `id` and `provider/id` to find the model.
|
||||||
|
|
||||||
### What about `/do-*` commands?
|
### What about `/do-*` commands?
|
||||||
|
|
||||||
@@ -204,14 +215,81 @@ The `/do-*` prompt templates are **no longer needed** for model selection. The s
|
|||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
Model mappings are defined in `~/.agents/extensions/smart-router/index.ts`. To change which model a tag routes to, edit the `MODELS` table and run `/reload`.
|
Model mappings are defined in `~/.agents/extensions/smart-router/index.ts`. To change routing, thinking, or compression behavior, edit the `MODELS`, `THINKING`, or `COMPRESS_TAGS` tables and run `/reload`.
|
||||||
|
|
||||||
### Files
|
### Files
|
||||||
|
|
||||||
| File | Purpose |
|
| File | Purpose |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| `~/.agents/extensions/smart-router/index.ts` | Extension source |
|
| `~/.agents/extensions/smart-router/index.ts` | Extension source |
|
||||||
| `~/.pi/agent/extensions/smart-router/index.ts` | Synced copy (Gitea backup) |
|
| `~/.agents/extensions/smart-router/package.json` | npm deps (headroom-ai) |
|
||||||
|
| `~/.pi/agent/extensions/smart-router/index.ts` | Synced copy |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Headroom
|
||||||
|
|
||||||
|
Headroom is a **context compression layer** that reduces prompt token usage by 60-95% for heavy analysis/code/devops workloads. It runs as a Docker container on the server (192.168.20.13) and is selectively triggered by the smart-router.
|
||||||
|
|
||||||
|
### How it works
|
||||||
|
|
||||||
|
1. Smart-router analyzes the prompt → determines if compression is needed
|
||||||
|
2. Tags `read`, `discuss`, `search` **never** trigger compression (these are fast paths)
|
||||||
|
3. For all other tags, if accumulated context exceeds ~5K tokens, the smart-router calls `compress()`
|
||||||
|
4. Messages are sent to the Headroom proxy at `192.168.20.13:8787`
|
||||||
|
5. Headroom compresses the context (using SmartCrusher for JSON, CodeCompressor for AST, Kompress-base ML for text)
|
||||||
|
6. Compressed messages are returned and forwarded to the LLM
|
||||||
|
7. If the proxy is down, messages pass through unchanged (graceful fallback)
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Desktop (.27) Server (.13)
|
||||||
|
───────────── ────────────
|
||||||
|
smart-router analyzes prompt headroom proxy (Docker)
|
||||||
|
│ │
|
||||||
|
│ if compress needed: │
|
||||||
|
│ compress(messages) ──────────────► │
|
||||||
|
│ HTTP POST 192.168.20.13:8787 │
|
||||||
|
│ ◄────────── compressed messages │
|
||||||
|
│ │
|
||||||
|
│ send to LLM │
|
||||||
|
│ │
|
||||||
|
│ if proxy down: pass through │
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compression thresholds
|
||||||
|
|
||||||
|
| Condition | Action |
|
||||||
|
|---|---|
|
||||||
|
| Tag is `read`/`discuss`/`search` | Skip — no compression |
|
||||||
|
| Context < 5K tokens | Skip — too small to benefit |
|
||||||
|
| Context ≥ 5K tokens + analysis/code/devops tag | Compress |
|
||||||
|
| Proxy unreachable | Pass through unchanged |
|
||||||
|
|
||||||
|
### Management
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check status
|
||||||
|
ssh 192.168.20.13 "docker ps --filter name=headroom"
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
ssh 192.168.20.13 "docker logs --tail 20 headroom"
|
||||||
|
|
||||||
|
# Restart
|
||||||
|
ssh 192.168.20.13 "docker restart headroom"
|
||||||
|
|
||||||
|
# Update image
|
||||||
|
ssh 192.168.20.13 "cd /home/sam/Docker/Containers/headroom && docker compose pull && docker compose up -d"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `~/.agents/extensions/smart-router/index.ts` | Compression logic (`compress()` import + `context` event handler) |
|
||||||
|
| `/home/sam/Docker/Containers/headroom/docker-compose.yml` | Docker service definition (on .13) |
|
||||||
|
| `/home/sam/Docker/Containers/headroom/.env` | Environment file (on .13) |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -307,4 +385,5 @@ pi-mcp-adapter connects Pi to external services via the Model Context Protocol.
|
|||||||
- [ ] Verify video-extract works with Gemini
|
- [ ] Verify video-extract works with Gemini
|
||||||
- [x] Add markitdown skill to Obsidian skills page
|
- [x] Add markitdown skill to Obsidian skills page
|
||||||
- [x] Add smart-router extension and update Obsidian docs
|
- [x] Add smart-router extension and update Obsidian docs
|
||||||
|
- [x] Deploy Headroom Docker compression on .13, integrate with smart-router
|
||||||
- [ ] Clean up workspace-map.json entries for any stale memory packs
|
- [ ] Clean up workspace-map.json entries for any stale memory packs
|
||||||
|
|||||||
Reference in New Issue
Block a user