Add 5 pi extensions: pi-subagents, pi-crew, rpiv-pi, pi-interactive-shell, pi-intercom

This commit is contained in:
2026-05-08 15:59:25 +10:00
parent d0d1d9b045
commit 31b4110c87
457 changed files with 85157 additions and 0 deletions

View File

@@ -0,0 +1,548 @@
# Plan: pi-crew Optimization Opportunities
> Ngày: 2026-04-29 | Revised: 2026-04-29 (after design review)
> Based on: research-pi-coding-agent.md, research-extension-system.md, research-extension-examples.md
## Overview
Sau khi đọc sâu extension system của pi-mono và toàn bộ 60+ example extensions, dưới đây là
danh sách cơ hội tối ưu cho pi-crew, được phân loại theo effort và impact.
**Revision notes (2026-04-29):**
- Re-order Phase 1 để compliance-required task (permission gate) đi trước optimization task.
- Tách `terminate: true` thành 2 sub-task vì rủi ro UX khác nhau.
- Hạ "custom compaction model" từ Phase 2 xuống Phase 3 (risk vs ROI).
- Đổi cancel-compaction thành **defer + retry** (tránh context overflow).
- Threshold compaction động theo `contextWindow` thay vì hardcode 150k.
- Thêm rollback strategy ở cấp roadmap + gap research bổ sung.
## Priority Matrix
```
Impact
│ HIGH │ HIGH │
│ Effort │ Effort │
│ LOW │ MEDIUM │
│ ───────┼─────────│
│ MEDIUM │ LOW │
│ Effort │ Effort │
│ LOW │ MEDIUM │
└──────────────────→ Effort
```
## Implementation Status (2026-04-29)
Implemented in code:
- Phase 1.4 permission gate for destructive `team` tool calls.
- Phase 1.6 telemetry baseline fields for subagent completion (`turnCount`, `terminated`, `durationMs`).
- Phase 1.2 compaction guard as defer + retry, moved into `src/extension/registration/compaction-guard.ts`.
- Phase 1.1a `terminate: true` for background/queued subagent launches.
- Phase 1.3 public event bus events (`crew.subagent.completed`, `crew.run.completed`, `crew.run.failed`, `crew.run.cancelled`).
- Phase 1.5 auto session naming for new team runs when no custom session name exists.
- Phase 2.1 proactive compaction with dynamic context-window threshold.
- Phase 2.3 Pi session entries for run start/completion (`crew:run-started`, `crew:run-completed`).
- Phase 2.4 config-driven subagent tool aliases via `config.tools`.
- Phase 2.5 foreground working indicator, using optional API compatibility shim because older `pi-coding-agent` type surfaces may not expose `ctx.ui.setWorkingIndicator`.
- Phase 3.3 safe mailbox event bus publication (`crew.mailbox.message`, `crew.mailbox.acknowledged`).
Deferred by design:
- Phase 1.1b foreground `terminate: true` is implemented as opt-in via `config.tools.terminateOnForeground=true`; default remains safe/off pending telemetry.
- Phase 3.4 structured artifact index is implemented for pi-crew-triggered compactions via `crew:artifact-index` session entries plus compaction custom instructions. Direct `CompactionEntry.details` augmentation is not available through the current upstream extension API without replacing default compaction.
- Phase 3.1, 3.3b, 3.5, and 4.2 are now marked won't-do/research-only after deeper risk/ROI analysis.
- Phase 3.2 remains conditional on agent-level opt-in design. Phase 4.1 remains deferred pending format-compat research.
Validation:
- `npm run typecheck` passes.
- `npm test` passes: 283 unit tests + 26 integration tests.
## Roadmap-level Rollback Strategy
- **1 sub-task = 1 commit** có thể revert độc lập. KHÔNG gộp toàn bộ Phase 1 vào 1 commit.
- Mỗi commit phải có test riêng. Nếu fail trong production, `git revert <sha>` không kéo theo task khác.
- Phase 1.6 (telemetry) làm trước Phase 1.1 để có baseline đo lường.
---
## Phase 1: Quick Wins & Compliance (HIGH impact, LOW effort)
Thời gian ước tính: 2-3 sessions. **Thứ tự đã re-order so với research gốc.**
### 1.4 (FIRST) Permission gate cho destructive team actions
**Lý do làm trước:** AGENTS.md quy định *"Management deletes must require confirm: true; referenced
resources blocked unless force: true"* — đây là **rule bắt buộc**, không phải optimization.
**Files cần sửa:** `src/extension/registration/team-tool.ts` (hoặc file mới)
**Hiện tại:** Có check trong handler nhưng không có `tool_call` hook → message lỗi không nhất quán.
**Tối ưu:**
```typescript
pi.on("tool_call", async (event, ctx) => {
if (event.toolName !== "team") return;
const input = event.input as Record<string, unknown>;
const destructiveActions = ["delete", "forget", "prune", "cleanup"];
if (destructiveActions.includes(input.action as string)) {
if (!input.confirm && !input.force) {
return {
block: true,
reason: `Destructive action '${input.action}' requires confirm=true (or force=true to bypass)`,
};
}
}
});
```
**Note về precedence:** Nếu schema validate đã check `confirm`, **CHỌN 1 chỗ duy nhất**:
- Option A: Để schema validate → bỏ hook (đơn giản hơn).
- Option B: Để hook validate → gỡ check trong handler (consistent error message).
→ Đề nghị Option B vì hook gate tất cả entry points (kể cả nếu sau này có entry point bypass schema).
**Expected benefit:** Compliance với AGENTS.md, safety net production.
---
### 1.6 (NEW) Telemetry baseline cho terminate impact
**Lý do làm trước 1.1:** Plan gốc claim "giảm 30-50% LLM turns" — chỉ là phỏng đoán. Cần baseline đo lường thực tế.
**Files cần sửa:** `src/runtime/subagent-manager.ts`, `src/extension/register.ts`
**Tối ưu:** Log `turnCount` + `terminated: boolean` vào event `crew.subagent.completed`:
```typescript
pi.events.emit("crew.subagent.completed", {
id: record.id,
runId: record.runId,
type: record.type,
status: record.status,
usage: record.usage,
turnCount: record.turnCount, // ← NEW
terminated: record.terminated, // ← NEW (false trước Phase 1.1)
durationMs: record.durationMs, // ← NEW
});
```
**Expected benefit:** Đo trước/sau Phase 1.1 để xác định ROI thực tế. Nếu < 10% turn saving, có thể quyết định không deploy 1.1b.
---
### 1.2 `session_before_compact` guard cho foreground runs (DEFER, không CANCEL)
**Files cần sửa:** `src/extension/register.ts`
**Hiện tại:** Không hook compaction → có thể compact giữa chừng foreground run.
**Tối ưu (revised):** Defer + retry thay vì cancel cứng (tránh context overflow):
```typescript
let pendingCompactReason: string | null = null;
pi.on("session_before_compact", async (event, ctx) => {
if (foregroundControllers.size > 0) {
pendingCompactReason = "deferred-during-foreground-run";
ctx.ui.notify("Compaction deferred until foreground run completes", "info");
return { cancel: true };
}
});
// Retry sau khi run xong:
pi.on("turn_end", (_event, ctx) => {
if (foregroundControllers.size === 0 && pendingCompactReason) {
pendingCompactReason = null;
ctx.compact({
onComplete: () => ctx.ui.notify("Deferred compaction completed", "info"),
});
}
});
```
**Expected benefit:** Ngăn lỗi context mất mát trong foreground run, vẫn đảm bảo compact eventually chạy.
**Risk:** Nếu run cực dài + foregroundControllers chưa bao giờ về 0 → vẫn overflow. Mitigation: hard threshold (vd 95% context window) bypass deferral, force compact.
---
### 1.1a `terminate: true` cho **background queued** results (SAFE)
**Lý do tách:** Background queue không có UX risk, foreground completed có risk (xem 1.1b).
**Files cần sửa:** `src/extension/registration/subagent-tools.ts`
**Tối ưu:**
```typescript
// Agent tool — khi background: terminate ngay sau khi đã queued
if (params.run_in_background) {
return {
...subagentToolResult(...),
terminate: true, // ← Tiết kiệm 1 LLM turn, không có rủi ro UX
};
}
```
**Expected benefit:** Giảm LLM turn cho mọi background spawn. Verify bằng telemetry từ 1.6.
---
### 1.3 Public events qua `pi.events`
**Files cần sửa:** `src/extension/register.ts`
**Hiện tại:** Event bus chỉ dùng cho internal `subagent.stuck-blocked`.
**Naming convention (revised):** Thống nhất với upstream pattern `dot.kebab` (đã dùng cho `subagent.stuck-blocked`):
```typescript
// Document trong README là PUBLIC API:
pi.events.emit("crew.subagent.completed", { ... });
pi.events.emit("crew.run.completed", { runId, team, workflow, status, taskCount, totalUsage });
pi.events.emit("crew.run.failed", { runId, team, workflow, error, failedTaskId });
pi.events.emit("crew.run.cancelled", { runId, team, workflow, status, taskCount });
```
**Versioning:** Note trong README rằng event payload là semver-stable từ pi-crew 0.2.0.
**Expected benefit:** Extension khác (logging, notification, metrics) có thể subscribe.
---
### 1.5 Auto session name từ team run context
**Files cần sửa:** `src/extension/registration/team-tool.ts`
**Tối ưu:**
```typescript
// Trong team tool execute, trước khi start run:
pi.setSessionName(`pi-crew: ${team}/${workflow}${goal.slice(0, 60)}`);
```
**Expected benefit:** Better session organization khi xem session list.
---
### 1.1b (OPT-IN DONE, DEFAULT OFF) `terminate: true` cho **foreground completed** results
**Lý do default off:** UX risk — nếu LLM không có turn để summarize result, user có thể không hiểu output.
**Implementation:** opt-in flag, default safe:
```json
{
"tools": {
"terminateOnForeground": true
}
}
```
When enabled, foreground `Agent`/`crew_agent` completed results set `terminate: true` and persist `record.terminated=true` for telemetry. Decision to make this default-on still requires telemetry evidence:
- Average turn count sau Agent foreground completion ≥ 2.
- Output đã đủ self-explanatory (đo qua user feedback hoặc retry rate).
---
## Phase 2: Medium Effort Optimizations
Thời gian ước tính: 2-3 sessions. (Đã giảm 1 task so với plan gốc.)
### 2.1 Proactive compaction monitoring (DYNAMIC threshold)
**Files cần sửa:** File mới `src/extension/registration/compaction-guard.ts`
**Hiện tại:** Chỉ dựa vào built-in auto-compaction (có thể chậm).
**Tối ưu (revised):** Threshold động theo `contextWindow`:
```typescript
export function registerCompactionGuard(pi: ExtensionAPI) {
const TRIGGER_RATIO = 0.75; // 75% context window → trigger
pi.on("turn_end", (_event, ctx) => {
const usage = ctx.getContextUsage();
const ctxWindow = ctx.model?.contextWindow ?? 200_000;
const threshold = ctxWindow * TRIGGER_RATIO;
if (usage?.tokens && usage.tokens > threshold) {
// Foreground guard từ Phase 1.2 sẽ defer nếu cần
ctx.compact({
customInstructions: "Prioritize keeping team run state, task results, and artifact references. Keep the conversation context brief.",
onComplete: () => ctx.ui.notify("Auto-compacted context during team run", "info"),
onError: (err) => ctx.ui.notify(`Compaction failed: ${err.message}`, "error"),
});
}
});
}
```
**Lý do dùng ratio thay vì hardcode:** Claude Haiku 200k, Gemini Pro 2M, GPT-4o 128k, model nhỏ 32k. Hardcode 150k sai cho 90% trường hợp.
**Expected benefit:** Tránh context overflow error khi foreground run quá dài.
---
### 2.3 `pi.appendEntry` cho cross-session run awareness
**Files cần sửa:** `src/extension/register.ts`
**Tối ưu:**
```typescript
// Khi bắt đầu run:
pi.appendEntry("crew:run-started", {
runId, team, workflow, goal, timestamp: Date.now(),
});
// Khi hoàn thành run:
pi.appendEntry("crew:run-completed", {
runId, status, taskCount, totalUsage, timestamp: Date.now(),
});
```
**Expected benefit:**
- Khi reload session, biết được các run liên quan.
- Session export bao gồm run context.
- Dễ dàng track history.
---
### 2.4 Config-driven tool registration
**Files cần sửa:** `src/extension/registration/subagent-tools.ts`
**Hiện tại:** Luôn register 6 tool variants (Agent, crew_agent, + result + steer).
**Tối ưu:**
```typescript
export function registerSubagentTools(pi: ExtensionAPI, subagentManager: SubagentManager) {
const cfg = loadConfig(pi.getFlag("cwd") as string || process.cwd());
// Conflict-safe tools (luôn register)
pi.registerTool(crewAgentTool);
pi.registerTool(crewAgentResultTool);
// Claude-style aliases: only if not disabled
if (cfg.config.tools?.enableClaudeStyleAliases !== false) {
try { pi.registerTool(agentTool); } catch {}
try { pi.registerTool(getSubagentResultTool); } catch {}
}
// Steer: only if supported
if (cfg.config.tools?.enableSteer !== false) {
try { pi.registerTool(crewAgentSteerTool); } catch {}
try { pi.registerTool(steerSubagentTool); } catch {}
}
}
```
**Expected benefit:** Tránh pollute tool namespace, fine-grained control cho user.
---
### 2.5 Custom working indicator trong foreground runs
**Files cần sửa:** `src/extension/register.ts`
**Tối ưu:**
```typescript
// Khi foreground run active:
ctx.ui.setWorkingIndicator({
frames: ["⣾", "⣽", "⣻", "⢿", "⡿", "⣟", "⣯", "⣷"],
intervalMs: 80,
});
ctx.ui.setWorkingMessage(
`Team run: ${completedTasks}/${totalTasks} tasks done...`
);
// Khi kết thúc:
ctx.ui.setWorkingIndicator(); // Restore default
ctx.ui.setWorkingMessage(); // Clear
```
**Compat shim note:** Implementation dùng optional API compatibility shim:
```typescript
(ctx.ui as { setWorkingIndicator?: (...) => void }).setWorkingIndicator?.(...)
```
Lý do: một số version/type surface của `@mariozechner/pi-coding-agent` chưa expose
`setWorkingIndicator` trên `ExtensionUIContext`. Optional shim giữ backward compatibility và
tránh crash/runtime type mismatch; nếu API không tồn tại thì chỉ bỏ qua custom spinner và vẫn dùng
`setWorkingMessage()`.
**Expected benefit:** Better UX, cho user biết team run đang chạy.
---
## Phase 3: Future Considerations (HIGH effort hoặc Risky)
### 3.1 (WON'T DO unless concrete pain point appears) Branch-level task isolation
Dùng `ctx.fork()` để tạo branch mới cho mỗi task trong team run.
**Decision:** không triển khai mặc định. Worktree isolation đã giải quyết phần quan trọng nhất (file-system/task isolation). Branch-level isolation tạo branch explosion, navigation UX phức tạp, và state-sync risk giữa flat run manifest/tasks/events với Pi session tree. Chỉ reconsider nếu có user complaint cụ thể về context contamination không giải quyết được bằng worktree/dependency-context controls.
### 3.2 Session handoff cho long-running tasks
Khi 1 task quá dài, handoff sang session mới (pattern từ `handoff.ts`), isolate context.
**Conditional trigger:** chỉ enable cho agent/task opt-in, ví dụ agent frontmatter `handoff: true`, hoặc heuristic token estimate > 30% context window.
**Result transport:** child session trả về artifact reference hoặc mailbox message để parent session vẫn aggregate được kết quả mà không cần import toàn bộ transcript.
### 3.3 Mailbox qua `pi.events`
#### 3.3a (DONE) Publish mailbox lifecycle events while preserving file-backed mailbox
Implementation publishes safe public events without changing the durable mailbox source of truth:
```typescript
pi.events.emit("crew.mailbox.message", { runId, id, direction, from, to, taskId, source });
pi.events.emit("crew.mailbox.acknowledged", { runId, messageId, delivery });
```
This keeps file-backed mailbox semantics intact while enabling observers/notification extensions.
#### 3.3b (WON'T DO) Replace file-backed mailbox with pure event-bus mailbox
Thay vì file-based mailbox, dùng event bus làm transport chính cho real-time communication giữa tasks.
**Decision:** won't do. Latency gain is marginal; durability/restart/replay loss is catastrophic for long-running pi-crew runs. 3.3a gives best-of-both-worlds: durable file-backed mailbox remains source of truth, event bus is an observer/notification layer.
### 3.4 (PROMOTED + DONE) Compaction với structured artifact index
Preserve pi-crew artifact references across compaction.
**Implementation:** `compaction-guard.ts` collects recent run artifacts and:
- appends a structured `crew:artifact-index` session entry for machine-readable continuity;
- adds a markdown artifact index to pi-crew-triggered compaction `customInstructions` so the compaction summary preserves run IDs and artifact paths.
**Note:** Directly augmenting `CompactionEntry.details` is not supported by the current upstream `session_before_compact` result contract unless pi-crew replaces default compaction entirely. We intentionally avoid full custom compaction because summary quality/regression risk is higher.
### 3.5 (WON'T DO unless cost telemetry shows pain) Custom compaction với model nhẹ
**Decision:** won't do by default.
- Phụ thuộc vào auth setup của user cho Gemini Flash / Haiku — pi-crew không kiểm soát được.
- Bad summary làm mất context → ảnh hưởng cả run.
- ROI không rõ: compaction chạy không thường xuyên.
Reconsider only if telemetry/user feedback shows compaction cost is a real pain point. Reference remains `examples/extensions/custom-compaction.ts` upstream.
---
## Phase 4 (NEW): Research bổ sung
Hai pattern upstream chưa được khai thác trong plan gốc:
### 4.1 (DEFER — research format compat first) `resources_discover` event integration
Pi-crew có thể inject builtin agents/teams như Pi resources native (skills/prompts):
```typescript
pi.on("resources_discover", () => ({
skillPaths: [path.join(__dirname, "..", "agents")],
promptPaths: [path.join(__dirname, "..", "workflows")],
}));
```
**Decision:** defer. Cần research format compat giữa pi-crew agent markdown vs Pi skill/prompt format trước khi implement. Key risk: dual exposure UX confusion (same capability reachable via `Agent` tool and native skill/prompt) plus loss of pi-crew durable run semantics if exposed as stateless skills.
### 4.2 (RESEARCH-ONLY) `pi.registerProvider` cho virtual "team" model
Đăng ký team như virtual provider để user gọi:
```bash
pi --model crew/researcher
```
Thay vì dùng tool `Agent`.
**Decision:** research-only / not an implementation target. Provider API semantics (single LLM stream, context window, thinking levels, token pricing) do not map cleanly to orchestrator semantics (multi-agent task events, aggregate usage/cost, per-worker contexts). Likely requires upstream provider API changes.
---
## Implementation Order (REVISED)
```
Phase 1 (Quick Wins & Compliance):
[x] 1.4 permission gate destructive team actions ← FIRST (compliance)
[x] 1.6 telemetry baseline ← SECOND (measure first)
[x] 1.2 session_before_compact defer (not cancel)
[x] 1.1a terminate: true on background queued (safe)
[x] 1.3 public crew.* events
[x] 1.5 auto session name
[x] 1.1b terminate: true on foreground (OPT-IN, default off; default-on conditional on telemetry)
Phase 2 (Medium):
[x] 2.1 proactive compaction (dynamic threshold)
[x] 2.3 pi.appendEntry cross-session awareness
[x] 2.4 config-driven tool registration
[x] 2.5 custom working indicator
Phase 3 (Future / Risky):
[-] 3.1 branch-level task isolation (WON'T DO unless concrete pain point appears)
[ ] 3.2 session handoff for long tasks (CONDITIONAL on agent opt-in)
[x] 3.3a publish mailbox lifecycle events (safe subset)
[-] 3.3b replace file-backed mailbox with pure event bus (WON'T DO)
[x] 3.4 structured artifact index in compaction (promoted/done)
[-] 3.5 custom compaction with cheap model (WON'T DO unless cost telemetry shows pain)
Phase 4 (Research):
[ ] 4.1 resources_discover integration (DEFER; format compat research first)
[-] 4.2 virtual team provider (RESEARCH-ONLY)
```
## Files affected
```
PHASE 1:
src/extension/registration/team-tool.ts ← 1.4 permission gate
src/extension/registration/subagent-tools.ts ← 1.1a terminate + 1.1b opt-in terminate
src/extension/register.ts ← 1.2 defer guard, 1.3 events, 1.5 session name
src/runtime/subagent-manager.ts ← 1.6 telemetry fields
PHASE 2:
src/extension/registration/compaction-guard.ts ← NEW: 1.2 defer guard + 2.1 proactive + 3.4 artifact index
src/extension/register.ts ← 2.3 appendEntry, 2.5 working indicator
src/extension/registration/subagent-tools.ts ← 2.4 config-driven
PHASE 3:
src/extension/team-tool/api.ts ← 3.3a mailbox lifecycle events
```
## Risk Assessment (REVISED)
| Change | Risk | Mitigation |
|---|---|---|
| Permission gate (1.4) | Block legitimate use | Allow `force=true` bypass, document trong README |
| Telemetry (1.6) | Privacy / log size | No PII in subagent telemetry payload; opt-out applied via `config.telemetry.enabled=false`; no sampling currently because payload is small/local event-bus data |
| Defer compaction (1.2) | Run dài infinite → overflow | Hard threshold 95% bypass deferral |
| `terminate: true` background (1.1a) | None significant | Background không cần LLM follow-up by design |
| Public events (1.3) | Event storm, breaking change | Rate limit, semver document |
| Auto session name (1.5) | Override user-set name | Applied: chỉ set nếu chưa có name custom (`!pi.getSessionName()`) |
| `terminate: true` foreground (1.1b) | LLM không summarize khi enabled | OPT-IN flag (`config.tools.terminateOnForeground`, default off); default-on requires telemetry evidence |
| Dynamic threshold (2.1) | contextWindow undefined | Default 200_000 fallback |
| Artifact index in compaction (3.4) | Index size bloat / format drift | Cap recent index (10 runs / 80 artifacts), structured `crew:artifact-index` session entry, non-replacing default compaction |
| appendEntry (2.3) | Session bloat | TTL/cleanup strategy |
| Config-driven tools (2.4) | User confused | Default = current behavior, opt-in change |
| Working indicator (2.5) | Conflict với extension khác / older Pi UI type surface | Applied: restore default on finally; compat shim makes `setWorkingIndicator` optional |
| Custom compaction model (3.5) | Bad summary, auth missing | Fall back to default, multi-model retry |
## Testing Strategy
- **Unit tests:**
- `terminate: true` flag in tool results (1.1a/b).
- Permission gate blocks/allows correctly với confirm/force matrix (1.4).
- Threshold calculation từ contextWindow (2.1).
- Telemetry payload schema (1.6).
- Artifact index payload structure + cap behavior (3.4).
- **Integration tests:**
- Foreground run + compaction interaction (1.2 defer + 2.1 trigger).
- Multiple concurrent runs + permission gate (1.4).
- Event publish/subscribe round-trip (1.3).
- Compaction with N artifacts includes artifact index in custom instructions (3.4).
- **Manual:**
- UI behavior với working indicator + session name (1.5, 2.5).
- Real LLM turn count trước/sau 1.1b với telemetry data (1.6 → 1.1b decision).
- **Regression:**
- Run full suite (`npm test`) sau mỗi commit, không gộp Phase.
- Doctor tests phải dùng `--test-timeout=90000` trên Windows.