Files

Sam Rolfe 31b4110c87 Add 5 pi extensions: pi-subagents, pi-crew, rpiv-pi, pi-interactive-shell, pi-intercom

2026-05-08 15:59:25 +10:00

23 KiB

Raw Blame History

Plan: pi-crew Optimization Opportunities

Ngày: 2026-04-29 | Revised: 2026-04-29 (after design review) Based on: research-pi-coding-agent.md, research-extension-system.md, research-extension-examples.md

Overview

Sau khi đọc sâu extension system của pi-mono và toàn bộ 60+ example extensions, dưới đây là danh sách cơ hội tối ưu cho pi-crew, được phân loại theo effort và impact.

Revision notes (2026-04-29):

Re-order Phase 1 để compliance-required task (permission gate) đi trước optimization task.
Tách terminate: true thành 2 sub-task vì rủi ro UX khác nhau.
Hạ "custom compaction model" từ Phase 2 xuống Phase 3 (risk vs ROI).
Đổi cancel-compaction thành defer + retry (tránh context overflow).
Threshold compaction động theo contextWindow thay vì hardcode 150k.
Thêm rollback strategy ở cấp roadmap + gap research bổ sung.

Priority Matrix

Impact
  ↑
  │  HIGH   │  HIGH   │
  │  Effort │  Effort │
  │  LOW    │  MEDIUM │
  │  ───────┼─────────│
  │  MEDIUM │  LOW    │
  │  Effort │  Effort │
  │  LOW    │  MEDIUM │
  └──────────────────→ Effort

Implementation Status (2026-04-29)

Implemented in code:

Phase 1.4 permission gate for destructive team tool calls.
Phase 1.6 telemetry baseline fields for subagent completion (turnCount, terminated, durationMs).
Phase 1.2 compaction guard as defer + retry, moved into src/extension/registration/compaction-guard.ts.
Phase 1.1a terminate: true for background/queued subagent launches.
Phase 1.3 public event bus events (crew.subagent.completed, crew.run.completed, crew.run.failed, crew.run.cancelled).
Phase 1.5 auto session naming for new team runs when no custom session name exists.
Phase 2.1 proactive compaction with dynamic context-window threshold.
Phase 2.3 Pi session entries for run start/completion (crew:run-started, crew:run-completed).
Phase 2.4 config-driven subagent tool aliases via config.tools.
Phase 2.5 foreground working indicator, using optional API compatibility shim because older pi-coding-agent type surfaces may not expose ctx.ui.setWorkingIndicator.
Phase 3.3 safe mailbox event bus publication (crew.mailbox.message, crew.mailbox.acknowledged).

Deferred by design:

Phase 1.1b foreground terminate: true is implemented as opt-in via config.tools.terminateOnForeground=true; default remains safe/off pending telemetry.
Phase 3.4 structured artifact index is implemented for pi-crew-triggered compactions via crew:artifact-index session entries plus compaction custom instructions. Direct CompactionEntry.details augmentation is not available through the current upstream extension API without replacing default compaction.
Phase 3.1, 3.3b, 3.5, and 4.2 are now marked won't-do/research-only after deeper risk/ROI analysis.
Phase 3.2 remains conditional on agent-level opt-in design. Phase 4.1 remains deferred pending format-compat research.

Validation:

npm run typecheck passes.
npm test passes: 283 unit tests + 26 integration tests.

Roadmap-level Rollback Strategy

1 sub-task = 1 commit có thể revert độc lập. KHÔNG gộp toàn bộ Phase 1 vào 1 commit.
Mỗi commit phải có test riêng. Nếu fail trong production, git revert <sha> không kéo theo task khác.
Phase 1.6 (telemetry) làm trước Phase 1.1 để có baseline đo lường.

Phase 1: Quick Wins & Compliance (HIGH impact, LOW effort)

Thời gian ước tính: 2-3 sessions. Thứ tự đã re-order so với research gốc.

1.4 (FIRST) Permission gate cho destructive team actions

Lý do làm trước: AGENTS.md quy định "Management deletes must require confirm: true; referenced resources blocked unless force: true" — đây là rule bắt buộc, không phải optimization.

Files cần sửa: src/extension/registration/team-tool.ts (hoặc file mới)

Hiện tại: Có check trong handler nhưng không có tool_call hook → message lỗi không nhất quán.

Tối ưu:

pi.on("tool_call", async (event, ctx) => {
  if (event.toolName !== "team") return;
  const input = event.input as Record<string, unknown>;
  const destructiveActions = ["delete", "forget", "prune", "cleanup"];

  if (destructiveActions.includes(input.action as string)) {
    if (!input.confirm && !input.force) {
      return {
        block: true,
        reason: `Destructive action '${input.action}' requires confirm=true (or force=true to bypass)`,
      };
    }
  }
});

Note về precedence: Nếu schema validate đã check confirm, CHỌN 1 chỗ duy nhất:

Option A: Để schema validate → bỏ hook (đơn giản hơn).
Option B: Để hook validate → gỡ check trong handler (consistent error message).

→ Đề nghị Option B vì hook gate tất cả entry points (kể cả nếu sau này có entry point bypass schema).

Expected benefit: Compliance với AGENTS.md, safety net production.

1.6 (NEW) Telemetry baseline cho terminate impact

Lý do làm trước 1.1: Plan gốc claim "giảm 30-50% LLM turns" — chỉ là phỏng đoán. Cần baseline đo lường thực tế.

Files cần sửa: src/runtime/subagent-manager.ts, src/extension/register.ts

Tối ưu: Log turnCount + terminated: boolean vào event crew.subagent.completed:

pi.events.emit("crew.subagent.completed", {
  id: record.id,
  runId: record.runId,
  type: record.type,
  status: record.status,
  usage: record.usage,
  turnCount: record.turnCount,      // ← NEW
  terminated: record.terminated,    // ← NEW (false trước Phase 1.1)
  durationMs: record.durationMs,    // ← NEW
});

Expected benefit: Đo trước/sau Phase 1.1 để xác định ROI thực tế. Nếu < 10% turn saving, có thể quyết định không deploy 1.1b.

1.2 `session_before_compact` guard cho foreground runs (DEFER, không CANCEL)

Files cần sửa: src/extension/register.ts

Hiện tại: Không hook compaction → có thể compact giữa chừng foreground run.

Tối ưu (revised): Defer + retry thay vì cancel cứng (tránh context overflow):

let pendingCompactReason: string | null = null;

pi.on("session_before_compact", async (event, ctx) => {
  if (foregroundControllers.size > 0) {
    pendingCompactReason = "deferred-during-foreground-run";
    ctx.ui.notify("Compaction deferred until foreground run completes", "info");
    return { cancel: true };
  }
});

// Retry sau khi run xong:
pi.on("turn_end", (_event, ctx) => {
  if (foregroundControllers.size === 0 && pendingCompactReason) {
    pendingCompactReason = null;
    ctx.compact({
      onComplete: () => ctx.ui.notify("Deferred compaction completed", "info"),
    });
  }
});

Expected benefit: Ngăn lỗi context mất mát trong foreground run, vẫn đảm bảo compact eventually chạy.

Risk: Nếu run cực dài + foregroundControllers chưa bao giờ về 0 → vẫn overflow. Mitigation: hard threshold (vd 95% context window) bypass deferral, force compact.

1.1a `terminate: true` cho background queued results (SAFE)

Lý do tách: Background queue không có UX risk, foreground completed có risk (xem 1.1b).

Files cần sửa: src/extension/registration/subagent-tools.ts

Tối ưu:

// Agent tool — khi background: terminate ngay sau khi đã queued
if (params.run_in_background) {
  return {
    ...subagentToolResult(...),
    terminate: true,  // ← Tiết kiệm 1 LLM turn, không có rủi ro UX
  };
}

Expected benefit: Giảm LLM turn cho mọi background spawn. Verify bằng telemetry từ 1.6.

1.3 Public events qua `pi.events`

Files cần sửa: src/extension/register.ts

Hiện tại: Event bus chỉ dùng cho internal subagent.stuck-blocked.

Naming convention (revised): Thống nhất với upstream pattern dot.kebab (đã dùng cho subagent.stuck-blocked):

// Document trong README là PUBLIC API:
pi.events.emit("crew.subagent.completed", { ... });
pi.events.emit("crew.run.completed", { runId, team, workflow, status, taskCount, totalUsage });
pi.events.emit("crew.run.failed", { runId, team, workflow, error, failedTaskId });
pi.events.emit("crew.run.cancelled", { runId, team, workflow, status, taskCount });

Versioning: Note trong README rằng event payload là semver-stable từ pi-crew 0.2.0.

Expected benefit: Extension khác (logging, notification, metrics) có thể subscribe.

1.5 Auto session name từ team run context

Files cần sửa: src/extension/registration/team-tool.ts

Tối ưu:

// Trong team tool execute, trước khi start run:
pi.setSessionName(`pi-crew: ${team}/${workflow} — ${goal.slice(0, 60)}`);

Expected benefit: Better session organization khi xem session list.

1.1b (OPT-IN DONE, DEFAULT OFF) `terminate: true` cho foreground completed results

Lý do default off: UX risk — nếu LLM không có turn để summarize result, user có thể không hiểu output.

Implementation: opt-in flag, default safe:

{
  "tools": {
    "terminateOnForeground": true
  }
}

When enabled, foreground Agent/crew_agent completed results set terminate: true and persist record.terminated=true for telemetry. Decision to make this default-on still requires telemetry evidence:

Average turn count sau Agent foreground completion ≥ 2.
Output đã đủ self-explanatory (đo qua user feedback hoặc retry rate).

Phase 2: Medium Effort Optimizations

Thời gian ước tính: 2-3 sessions. (Đã giảm 1 task so với plan gốc.)

2.1 Proactive compaction monitoring (DYNAMIC threshold)

Files cần sửa: File mới src/extension/registration/compaction-guard.ts

Hiện tại: Chỉ dựa vào built-in auto-compaction (có thể chậm).

Tối ưu (revised): Threshold động theo contextWindow:

export function registerCompactionGuard(pi: ExtensionAPI) {
  const TRIGGER_RATIO = 0.75;  // 75% context window → trigger

  pi.on("turn_end", (_event, ctx) => {
    const usage = ctx.getContextUsage();
    const ctxWindow = ctx.model?.contextWindow ?? 200_000;
    const threshold = ctxWindow * TRIGGER_RATIO;

    if (usage?.tokens && usage.tokens > threshold) {
      // Foreground guard từ Phase 1.2 sẽ defer nếu cần
      ctx.compact({
        customInstructions: "Prioritize keeping team run state, task results, and artifact references. Keep the conversation context brief.",
        onComplete: () => ctx.ui.notify("Auto-compacted context during team run", "info"),
        onError: (err) => ctx.ui.notify(`Compaction failed: ${err.message}`, "error"),
      });
    }
  });
}

Lý do dùng ratio thay vì hardcode: Claude Haiku 200k, Gemini Pro 2M, GPT-4o 128k, model nhỏ 32k. Hardcode 150k sai cho 90% trường hợp.

Expected benefit: Tránh context overflow error khi foreground run quá dài.

2.3 `pi.appendEntry` cho cross-session run awareness

Files cần sửa: src/extension/register.ts

Tối ưu:

// Khi bắt đầu run:
pi.appendEntry("crew:run-started", {
  runId, team, workflow, goal, timestamp: Date.now(),
});

// Khi hoàn thành run:
pi.appendEntry("crew:run-completed", {
  runId, status, taskCount, totalUsage, timestamp: Date.now(),
});

Expected benefit:

Khi reload session, biết được các run liên quan.
Session export bao gồm run context.
Dễ dàng track history.

2.4 Config-driven tool registration

Files cần sửa: src/extension/registration/subagent-tools.ts

Hiện tại: Luôn register 6 tool variants (Agent, crew_agent, + result + steer).

Tối ưu:

export function registerSubagentTools(pi: ExtensionAPI, subagentManager: SubagentManager) {
  const cfg = loadConfig(pi.getFlag("cwd") as string || process.cwd());

  // Conflict-safe tools (luôn register)
  pi.registerTool(crewAgentTool);
  pi.registerTool(crewAgentResultTool);

  // Claude-style aliases: only if not disabled
  if (cfg.config.tools?.enableClaudeStyleAliases !== false) {
    try { pi.registerTool(agentTool); } catch {}
    try { pi.registerTool(getSubagentResultTool); } catch {}
  }

  // Steer: only if supported
  if (cfg.config.tools?.enableSteer !== false) {
    try { pi.registerTool(crewAgentSteerTool); } catch {}
    try { pi.registerTool(steerSubagentTool); } catch {}
  }
}

Expected benefit: Tránh pollute tool namespace, fine-grained control cho user.

2.5 Custom working indicator trong foreground runs

Files cần sửa: src/extension/register.ts

Tối ưu:

// Khi foreground run active:
ctx.ui.setWorkingIndicator({
  frames: ["⣾", "⣽", "⣻", "⢿", "⡿", "⣟", "⣯", "⣷"],
  intervalMs: 80,
});
ctx.ui.setWorkingMessage(
  `Team run: ${completedTasks}/${totalTasks} tasks done...`
);

// Khi kết thúc:
ctx.ui.setWorkingIndicator();   // Restore default
ctx.ui.setWorkingMessage();     // Clear

Compat shim note: Implementation dùng optional API compatibility shim:

(ctx.ui as { setWorkingIndicator?: (...) => void }).setWorkingIndicator?.(...)

Lý do: một số version/type surface của @mariozechner/pi-coding-agent chưa expose setWorkingIndicator trên ExtensionUIContext. Optional shim giữ backward compatibility và tránh crash/runtime type mismatch; nếu API không tồn tại thì chỉ bỏ qua custom spinner và vẫn dùng setWorkingMessage().

Expected benefit: Better UX, cho user biết team run đang chạy.

Phase 3: Future Considerations (HIGH effort hoặc Risky)

3.1 (WON'T DO unless concrete pain point appears) Branch-level task isolation

Dùng ctx.fork() để tạo branch mới cho mỗi task trong team run.

Decision: không triển khai mặc định. Worktree isolation đã giải quyết phần quan trọng nhất (file-system/task isolation). Branch-level isolation tạo branch explosion, navigation UX phức tạp, và state-sync risk giữa flat run manifest/tasks/events với Pi session tree. Chỉ reconsider nếu có user complaint cụ thể về context contamination không giải quyết được bằng worktree/dependency-context controls.

3.2 Session handoff cho long-running tasks

Khi 1 task quá dài, handoff sang session mới (pattern từ handoff.ts), isolate context.

Conditional trigger: chỉ enable cho agent/task opt-in, ví dụ agent frontmatter handoff: true, hoặc heuristic token estimate > 30% context window.

Result transport: child session trả về artifact reference hoặc mailbox message để parent session vẫn aggregate được kết quả mà không cần import toàn bộ transcript.

3.3 Mailbox qua `pi.events`

3.3a (DONE) Publish mailbox lifecycle events while preserving file-backed mailbox

Implementation publishes safe public events without changing the durable mailbox source of truth:

pi.events.emit("crew.mailbox.message", { runId, id, direction, from, to, taskId, source });
pi.events.emit("crew.mailbox.acknowledged", { runId, messageId, delivery });

This keeps file-backed mailbox semantics intact while enabling observers/notification extensions.

3.3b (WON'T DO) Replace file-backed mailbox with pure event-bus mailbox

Thay vì file-based mailbox, dùng event bus làm transport chính cho real-time communication giữa tasks.

Decision: won't do. Latency gain is marginal; durability/restart/replay loss is catastrophic for long-running pi-crew runs. 3.3a gives best-of-both-worlds: durable file-backed mailbox remains source of truth, event bus is an observer/notification layer.

3.4 (PROMOTED + DONE) Compaction với structured artifact index

Preserve pi-crew artifact references across compaction.

Implementation: compaction-guard.ts collects recent run artifacts and:

appends a structured crew:artifact-index session entry for machine-readable continuity;
adds a markdown artifact index to pi-crew-triggered compaction customInstructions so the compaction summary preserves run IDs and artifact paths.

Note: Directly augmenting CompactionEntry.details is not supported by the current upstream session_before_compact result contract unless pi-crew replaces default compaction entirely. We intentionally avoid full custom compaction because summary quality/regression risk is higher.

3.5 (WON'T DO unless cost telemetry shows pain) Custom compaction với model nhẹ

Decision: won't do by default.

Phụ thuộc vào auth setup của user cho Gemini Flash / Haiku — pi-crew không kiểm soát được.
Bad summary làm mất context → ảnh hưởng cả run.
ROI không rõ: compaction chạy không thường xuyên.

Reconsider only if telemetry/user feedback shows compaction cost is a real pain point. Reference remains examples/extensions/custom-compaction.ts upstream.

Phase 4 (NEW): Research bổ sung

Hai pattern upstream chưa được khai thác trong plan gốc:

4.1 (DEFER — research format compat first) `resources_discover` event integration

Pi-crew có thể inject builtin agents/teams như Pi resources native (skills/prompts):

pi.on("resources_discover", () => ({
  skillPaths: [path.join(__dirname, "..", "agents")],
  promptPaths: [path.join(__dirname, "..", "workflows")],
}));

Decision: defer. Cần research format compat giữa pi-crew agent markdown vs Pi skill/prompt format trước khi implement. Key risk: dual exposure UX confusion (same capability reachable via Agent tool and native skill/prompt) plus loss of pi-crew durable run semantics if exposed as stateless skills.

4.2 (RESEARCH-ONLY) `pi.registerProvider` cho virtual "team" model

Đăng ký team như virtual provider để user gọi:

pi --model crew/researcher

Thay vì dùng tool Agent.

Decision: research-only / not an implementation target. Provider API semantics (single LLM stream, context window, thinking levels, token pricing) do not map cleanly to orchestrator semantics (multi-agent task events, aggregate usage/cost, per-worker contexts). Likely requires upstream provider API changes.

Implementation Order (REVISED)

Phase 1 (Quick Wins & Compliance):
  [x] 1.4 permission gate destructive team actions  ← FIRST (compliance)
  [x] 1.6 telemetry baseline                        ← SECOND (measure first)
  [x] 1.2 session_before_compact defer (not cancel)
  [x] 1.1a terminate: true on background queued (safe)
  [x] 1.3 public crew.* events
  [x] 1.5 auto session name
  [x] 1.1b terminate: true on foreground (OPT-IN, default off; default-on conditional on telemetry)

Phase 2 (Medium):
  [x] 2.1 proactive compaction (dynamic threshold)
  [x] 2.3 pi.appendEntry cross-session awareness
  [x] 2.4 config-driven tool registration
  [x] 2.5 custom working indicator

Phase 3 (Future / Risky):
  [-] 3.1 branch-level task isolation (WON'T DO unless concrete pain point appears)
  [ ] 3.2 session handoff for long tasks (CONDITIONAL on agent opt-in)
  [x] 3.3a publish mailbox lifecycle events (safe subset)
  [-] 3.3b replace file-backed mailbox with pure event bus (WON'T DO)
  [x] 3.4 structured artifact index in compaction (promoted/done)
  [-] 3.5 custom compaction with cheap model (WON'T DO unless cost telemetry shows pain)

Phase 4 (Research):
  [ ] 4.1 resources_discover integration (DEFER; format compat research first)
  [-] 4.2 virtual team provider (RESEARCH-ONLY)

Files affected

PHASE 1:
  src/extension/registration/team-tool.ts         ← 1.4 permission gate
  src/extension/registration/subagent-tools.ts    ← 1.1a terminate + 1.1b opt-in terminate
  src/extension/register.ts                       ← 1.2 defer guard, 1.3 events, 1.5 session name
  src/runtime/subagent-manager.ts                 ← 1.6 telemetry fields

PHASE 2:
  src/extension/registration/compaction-guard.ts  ← NEW: 1.2 defer guard + 2.1 proactive + 3.4 artifact index
  src/extension/register.ts                       ← 2.3 appendEntry, 2.5 working indicator
  src/extension/registration/subagent-tools.ts    ← 2.4 config-driven

PHASE 3:
  src/extension/team-tool/api.ts                  ← 3.3a mailbox lifecycle events

Risk Assessment (REVISED)

Change	Risk	Mitigation
Permission gate (1.4)	Block legitimate use	Allow `force=true` bypass, document trong README
Telemetry (1.6)	Privacy / log size	No PII in subagent telemetry payload; opt-out applied via `config.telemetry.enabled=false`; no sampling currently because payload is small/local event-bus data
Defer compaction (1.2)	Run dài infinite → overflow	Hard threshold 95% bypass deferral
`terminate: true` background (1.1a)	None significant	Background không cần LLM follow-up by design
Public events (1.3)	Event storm, breaking change	Rate limit, semver document
Auto session name (1.5)	Override user-set name	Applied: chỉ set nếu chưa có name custom (`!pi.getSessionName()`)
`terminate: true` foreground (1.1b)	LLM không summarize khi enabled	OPT-IN flag (`config.tools.terminateOnForeground`, default off); default-on requires telemetry evidence
Dynamic threshold (2.1)	contextWindow undefined	Default 200_000 fallback
Artifact index in compaction (3.4)	Index size bloat / format drift	Cap recent index (10 runs / 80 artifacts), structured `crew:artifact-index` session entry, non-replacing default compaction
appendEntry (2.3)	Session bloat	TTL/cleanup strategy
Config-driven tools (2.4)	User confused	Default = current behavior, opt-in change
Working indicator (2.5)	Conflict với extension khác / older Pi UI type surface	Applied: restore default on finally; compat shim makes `setWorkingIndicator` optional
Custom compaction model (3.5)	Bad summary, auth missing	Fall back to default, multi-model retry

Testing Strategy

Unit tests:
- terminate: true flag in tool results (1.1a/b).
- Permission gate blocks/allows correctly với confirm/force matrix (1.4).
- Threshold calculation từ contextWindow (2.1).
- Telemetry payload schema (1.6).
- Artifact index payload structure + cap behavior (3.4).
Integration tests:
- Foreground run + compaction interaction (1.2 defer + 2.1 trigger).
- Multiple concurrent runs + permission gate (1.4).
- Event publish/subscribe round-trip (1.3).
- Compaction with N artifacts includes artifact index in custom instructions (3.4).
Manual:
- UI behavior với working indicator + session name (1.5, 2.5).
- Real LLM turn count trước/sau 1.1b với telemetry data (1.6 → 1.1b decision).
Regression:
- Run full suite (npm test) sau mỗi commit, không gộp Phase.
- Doctor tests phải dùng --test-timeout=90000 trên Windows.

23 KiB Raw Blame History

Plan: pi-crew Optimization Opportunities

Overview

Priority Matrix

Implementation Status (2026-04-29)

Roadmap-level Rollback Strategy

Phase 1: Quick Wins & Compliance (HIGH impact, LOW effort)

1.4 (FIRST) Permission gate cho destructive team actions

1.6 (NEW) Telemetry baseline cho terminate impact

1.2 session_before_compact guard cho foreground runs (DEFER, không CANCEL)

1.1a terminate: true cho background queued results (SAFE)

1.3 Public events qua pi.events

1.5 Auto session name từ team run context

1.1b (OPT-IN DONE, DEFAULT OFF) terminate: true cho foreground completed results

Phase 2: Medium Effort Optimizations

2.1 Proactive compaction monitoring (DYNAMIC threshold)

2.3 pi.appendEntry cho cross-session run awareness

2.4 Config-driven tool registration

2.5 Custom working indicator trong foreground runs

Phase 3: Future Considerations (HIGH effort hoặc Risky)

3.1 (WON'T DO unless concrete pain point appears) Branch-level task isolation

3.2 Session handoff cho long-running tasks

3.3 Mailbox qua pi.events

3.3a (DONE) Publish mailbox lifecycle events while preserving file-backed mailbox

3.3b (WON'T DO) Replace file-backed mailbox with pure event-bus mailbox

3.4 (PROMOTED + DONE) Compaction với structured artifact index

3.5 (WON'T DO unless cost telemetry shows pain) Custom compaction với model nhẹ

Phase 4 (NEW): Research bổ sung

4.1 (DEFER — research format compat first) resources_discover event integration

4.2 (RESEARCH-ONLY) pi.registerProvider cho virtual "team" model

Implementation Order (REVISED)

Files affected

Risk Assessment (REVISED)

Testing Strategy

23 KiB

Raw Blame History

1.2 `session_before_compact` guard cho foreground runs (DEFER, không CANCEL)

1.1a `terminate: true` cho background queued results (SAFE)

1.3 Public events qua `pi.events`

1.1b (OPT-IN DONE, DEFAULT OFF) `terminate: true` cho foreground completed results

2.3 `pi.appendEntry` cho cross-session run awareness

3.3 Mailbox qua `pi.events`

4.1 (DEFER — research format compat first) `resources_discover` event integration

4.2 (RESEARCH-ONLY) `pi.registerProvider` cho virtual "team" model