Add 5 pi extensions: pi-subagents, pi-crew, rpiv-pi, pi-interactive-shell, pi-intercom

This commit is contained in:
2026-05-08 15:59:25 +10:00
parent d0d1d9b045
commit 31b4110c87
457 changed files with 85157 additions and 0 deletions

View File

@@ -0,0 +1,180 @@
# pi-crew Architecture
`pi-crew` is a Pi package for coordinated multi-agent work. It is intentionally durable-first: every run is represented on disk, every task has a state record, and child workers stream progress into JSONL/status files so foreground sessions, background jobs, dashboards, and later restarts all read the same source of truth.
## Layers
```text
Pi extension layer
register tools, slash commands, widget/dashboard, notifier, lifecycle cleanup
Runtime layer
team runner, task graph scheduler, child Pi process runner, async runner,
model fallback, policy engine, worktree manager, live-session experimental path
State layer (project root resolves to <crewRoot>:
- .crew/ when no .pi/ exists in the repo (default)
- .pi/teams/ when the repo already has .pi/ (legacy reuse))
<crewRoot>/state/runs/{runId}/manifest.json
<crewRoot>/state/runs/{runId}/tasks.json
<crewRoot>/state/runs/{runId}/events.jsonl
<crewRoot>/state/runs/{runId}/agents/{taskId}/status.json
<crewRoot>/artifacts/{runId}/...
```
## Run flow
```text
user/team tool
handleTeamTool(action=run)
├─ discover agents/teams/workflows
├─ validate team/workflow refs
├─ create run manifest + task graph
├─ write goal artifact
└─ choose foreground/session-bound or async/background mode
├─ foreground: startForegroundRun() schedules executeTeamRun()
└─ async: spawnBackgroundTeamRun()
├─ node --import jiti-register.mjs background-runner.ts
├─ background-runner writes async.started + async.pid marker
└─ executeTeamRun()
├─ resolve ready task batch
├─ resolveBatchConcurrency() with hard cap
├─ runTeamTask() per task
│ ├─ build prompt + dependency context
│ ├─ choose configured Pi model candidates
│ ├─ spawn child `pi` worker
│ ├─ observe JSONL/stdout progress
│ ├─ persist agent status/events/output
│ └─ write result/log/transcript artifacts
├─ merge task updates monotonically
├─ write progress artifacts
└─ synthesize policy closeout
```
## Extension layer
`src/extension/register.ts` wires the package into Pi:
- `team` tool and management actions.
- Conflict-safe subagent tools: `crew_agent`, `crew_agent_result`, `crew_agent_steer`.
- Claude-style aliases: `Agent`, `get_subagent_result`, `steer_subagent` when available.
- Slash commands including `/team-run`, `/team-status`, `/team-dashboard`, `/team-doctor`, `/team-config`, `/team-summary`.
- Active-only widget and optional dashboard/sidebar UI.
- Foreground run scheduling and shutdown cleanup.
- Async completion notifier and session-start active-run summary.
The extension layer should remain thin: user input is normalized into tool parameters, then delegated to runtime/state modules.
## Runtime layer
### Team runner
`src/runtime/team-runner.ts` drives workflow execution. It reads queued tasks, computes the ready set from the task graph, applies concurrency limits, runs a batch, then merges results back into the latest task state. Terminal task states are monotonic: stale parallel snapshots must not regress completed/failed/cancelled/skipped tasks back to queued/running.
### Task runner
`src/runtime/task-runner.ts` executes one task. It prepares workspace/worktree context, renders a task prompt, chooses model candidates from Pi configuration, launches a child Pi process by default, and writes result artifacts. Scaffold mode is explicit dry-run only.
### Child Pi runtime
`src/runtime/child-pi.ts` is the default worker runtime. It:
- launches real `pi` child processes,
- hides Windows console windows with `windowsHide: true`,
- streams JSONL output into transcripts,
- compacts noisy message updates,
- isolates observer callback failures so progress persistence cannot kill orchestration,
- applies post-exit stdio guards for late output.
### Async background runner
`src/runtime/async-runner.ts` spawns detached background runs. Installed packages use an absolute `jiti-register.mjs` loader path because Node strip-types refuses TypeScript under `node_modules`. The runner fail-fasts if jiti is missing, and writes `async.pid` once startup begins so the parent can distinguish a healthy start from an early import crash.
### Concurrency and policy
`src/runtime/concurrency.ts` picks batch size from explicit limits, team settings, workflow settings, or built-in defaults. User-provided `limits.maxConcurrentWorkers` is hard-capped by default to prevent local DoS; `limits.allowUnboundedConcurrency=true` is an explicit opt-out and emits an observability event.
`src/runtime/policy-engine.ts` applies closeout and safety policy decisions such as limit exceeded, failed task blocking, stale workers, and green-contract failures.
### Model routing
Model choice is based on Pi's current configuration/model registry, not hardcoded providers. Task and agent records persist model attempts and routing metadata so dashboards/status can show requested model, selected model, fallback chain, and fallback reason.
## State layer
Run state is under `<crewRoot>` (`.crew/` for new projects, or `.pi/teams/` when the repo already has `.pi/`):
```text
<crewRoot>/state/runs/{runId}/
manifest.json run metadata/status/artifacts/async pid
tasks.json task graph and per-task status
events.jsonl append-only run events
events.jsonl.seq event sequence cache
agents.json aggregate agent cache
async.pid background startup marker
agents/{taskId}/
status.json per-agent status source
events.jsonl per-agent event stream
output.log compact worker output
sidechain.output.jsonl
live-control.jsonl
```
Artifacts are under:
```text
<crewRoot>/artifacts/{runId}/
goal.md
prompts/{taskId}.md
results/{taskId}.txt
logs/{taskId}.log
transcripts/{taskId}.jsonl
metadata/*.json
progress.md
summary.md
```
`<crewRoot>` resolution is centralised in `src/utils/paths.ts#projectCrewRoot()`:
- if `<repoRoot>/.pi/` already exists, return `<repoRoot>/.pi/teams/` (legacy reuse, no parallel `.crew/`)
- otherwise return `<repoRoot>/.crew/` (default for fresh projects)
User-global fallback (when no project root is detected) lives under `~/.pi/agent/extensions/pi-crew/`.
Atomic writes use temp-file replace with retry for transient Windows `EPERM`/`EBUSY`/`EACCES`. JSONL append paths are best-effort where used for observers/progress; write failures must not crash child output parsing.
## UI and observability
- The persistent widget shows active runs only.
- Stale async runs with dead background pids are hidden from the active widget.
- `/team-status` is the canonical detailed state view and can mark stale active async runs failed.
- `/team-dashboard` provides live history/details from `RunSnapshotCache`, with panes for agents, progress/events, mailbox attention, recent output, health, and metrics.
- Phase 9 observability uses a per-session `MetricRegistry` (`Counter`, `Gauge`, `Histogram`) wired to `crew.*` events via unsubscribe-returning `events.on()` handlers. The registry is disposed on session shutdown/reload; no global metric singleton is used.
- Metrics can be inspected with `/team-metrics` or `team api metrics-snapshot`, exported as redacted daily JSONL under `<crewRoot>/state/metrics/` when telemetry is enabled, formatted for Prometheus, or pushed to an opt-in OTLP HTTP endpoint.
- Heartbeat observability is split between dashboard summaries and a background `HeartbeatWatcher`: healthy/warn/stale/dead gradient metrics are emitted, first-dead detections notify operators, and consecutive dead ticks can append deadletter entries.
- Powerbar publishing is optional and event-compatible: pi-crew emits `powerbar:register-segment` for `pi-crew-active` / `pi-crew-progress`, emits `powerbar:update` payloads (`id`, `text`, optional `suffix`, `bar`, `color`), and mirrors status through `ctx.ui.setStatus("pi-crew", ...)` when no powerbar listener is detected.
- Transcript viewer is file-backed so it works for foreground and async runs; it defaults to bounded tail reads and can load full content on demand.
## Lifecycle and cleanup
Foreground runs are session-bound and should be interrupted on session shutdown or session switch. Only explicit `async: true` runs are allowed to survive the Pi session. Runtime cleanup is registered through Pi lifecycle hooks and a global reload cleanup guard.
## Configuration
Key config sections:
- `runtime`: `auto`, `child-process`, `scaffold`, experimental `live-session`.
- `limits`: concurrency/task/depth safety controls.
- `ui`: widget/dashboard/powerbar/model-token display settings.
- `observability`: in-memory metrics, heartbeat watcher interval, metric file retention.
- `telemetry`: opt-out switch for local telemetry sinks.
- `reliability`: opt-in auto-retry/auto-recover defaults and deadletter threshold.
- `otlp`: opt-in OTLP HTTP metric export.
- `agents`: builtin overrides for models/fallbacks/tools.
- `autonomous`: policy injection/profile for proactive team delegation.
See `usage.md`, `resource-formats.md`, `runtime-flow.md`, and `live-mailbox-runtime.md` for operational details.

View File

@@ -0,0 +1,36 @@
# Live Mailbox Runtime Direction
`pi-crew` currently uses workflow child-process orchestration: a run materializes tasks, executes them through the scheduler, writes artifacts/events, and optionally launches child Pi workers.
A full live mailbox runtime is intentionally out of scope for the current stable surface. Current foundational mailbox files are intentionally simple and local:
```text
{stateRoot}/mailbox/inbox.jsonl
{stateRoot}/mailbox/outbox.jsonl
{stateRoot}/mailbox/delivery.json
{stateRoot}/mailbox/tasks/{taskId}/inbox.jsonl
{stateRoot}/mailbox/tasks/{taskId}/outbox.jsonl
```
They are exposed through safe API operations (`read-mailbox`, `send-message`, `ack-message`, `read-delivery`, `validate-mailbox`) but do not yet imply always-on long-lived workers. If a full runtime is added later, it should build on the foundations already present:
- `src/state/contracts.ts` for status/event contracts
- `src/state/task-claims.ts` for claim/lease safety
- `src/runtime/worker-heartbeat.ts` for liveness
- `src/state/locks.ts` for run-level mutation safety
- `action: "api"` for safe interop boundaries
## Proposed phases
1. **Read-only interop** — already started with `api` operations.
2. **Heartbeat writers** — allow workers to update heartbeat/progress safely.
3. **Claim-safe task lifecycle** — expose claim/release/transition operations with tokens.
4. **Mailbox** — add worker inbox/leader inbox files and delivery state.
5. **Live workers** — only after the above contracts are stable.
## Non-goals for now
- No always-on background worker pool.
- No automatic destructive cleanup of dirty worktrees.
- No recursive team spawning by workers.
- No mailbox mutation without locks and schema validation.

View File

@@ -0,0 +1,733 @@
# pi-crew Next Upgrade Roadmap
Date: 2026-05-05
Source inputs:
- `docs/research-oh-my-pi-distillation.md`
- `docs/source-runtime-refactor-map.md`
- Recent runtime hardening commits through `f5d47aa feat: surface run effectiveness evidence`
This document tracks the next practical upgrades after the current scaffold/no-op subagent fix, runtime safety classification, cancellation provenance, intent audit trail, prompt pipeline artifacts, capability inventory artifacts, and run effectiveness reporting.
## Current Baseline
Already implemented and pushed:
- Real child worker execution is the default.
- Implicit scaffold/no-op runs are blocked when worker execution is disabled by config/env.
- Explicit `runtime.mode=scaffold` remains available for dry-run prompt/artifact generation.
- Run `summary.md`, `progress.md`, and `status` now expose effectiveness evidence.
- Structured cancellation reasons flow through retry/cancel/team-runner/run events/metrics/UI snapshot.
- `cancel`, `cleanup`, `forget`, and `prune` accept audit intent metadata.
- Live-agent control distinguishes `steer` from `follow-up` at live-control/API level.
- Retry attempts have `attemptId`; max-retry deadletters link to the final `attemptId`.
- Worker prompt pipeline and capability inventory metadata artifacts are written per task.
## Priority Legend
- **P0**: correctness/safety issue; should be addressed before next release if feasible.
- **P1**: high user-visible value or reliability gain; good patch-release candidates.
- **P2**: larger subsystem work; should be planned and sequenced.
- **P3**: polish/UX/longer-term architecture.
## P0 — Prevent Ineffective Completed Runs
### P0.1 Enforce effectiveness policy for non-scaffold workers
**Problem**
`summary/status` now surface effectiveness evidence, but non-scaffold `child-process`/`live-session` runs can still end `completed` when task evidence is weak unless the existing mutation guard fires.
**Target behavior**
- For real workers, a run with completed tasks but no observable worker activity should be `blocked` or `failed`, not silently `completed`.
- Keep explicit scaffold dry-runs allowed, but label them as dry-runs.
- Policy should be configurable:
- `runtime.effectivenessGuard = "off" | "warn" | "block" | "fail"`
- default candidate: `warn` for read-only roles, `block` for mutating roles.
**Suggested files**
- `src/runtime/team-runner.ts`
- `src/runtime/completion-guard.ts`
- `src/state/types.ts` if storing guard result on manifest/tasks
- `src/schema/config-schema.ts`
- `src/config/config.ts`
- `test/unit/summary.test.ts`
- `test/unit/team-runner-merge.test.ts` or new `test/unit/effectiveness-guard.test.ts`
**Implementation sketch**
1. Extract run effectiveness calculation into a reusable exported helper, e.g.:
```ts
export interface RunEffectivenessSummary {
completed: number;
observable: number;
noObservedWorkTaskIds: string[];
needsAttentionTaskIds: string[];
workerExecution: "enabled" | "disabled/scaffold";
severity: "ok" | "warning" | "blocked" | "failed";
}
```
2. Use this helper for:
- `progress.md`
- `summary.md`
- `status`
- policy enforcement before `run.completed`.
3. For non-scaffold runs, if mutating tasks have no mutation/tool/model/transcript evidence:
- append `policy.action` with `reason: "ineffective_worker"`;
- set run `blocked` or `failed` depending config;
- include task IDs in `data`.
**Acceptance criteria**
- A mocked child-process run with no tool/model/transcript evidence does not report clean `completed` by default.
- Scaffold run still completes as explicit dry-run and displays `Worker execution: disabled/scaffold`.
- `status` clearly lists `noObservedWork` and `needsAttention` task IDs.
- Unit tests cover warn/block/fail modes.
**Verification**
```bash
npx tsc --noEmit
node --experimental-strip-types --test --test-concurrency=1 --test-timeout=30000 test/unit/effectiveness-guard.test.ts test/unit/summary.test.ts
npm run test:unit
```
### P0.2 Make runtime safety visible in manifest and run events
**Problem**
`runtime.safety` exists in runtime resolution, but it is not persisted as first-class run metadata. Debugging currently requires reading events or inferred artifacts.
**Target behavior**
- Manifest records resolved runtime:
```json
{
"runtimeResolution": {
"kind": "child-process",
"requestedMode": "auto",
"safety": "trusted",
"fallback": "child-process",
"reason": "..."
}
}
```
- `run.running` or `run.blocked` event includes the same resolution.
**Suggested files**
- `src/state/types.ts`
- `src/extension/team-tool/run.ts`
- `src/runtime/background-runner.ts`
- `src/extension/team-tool/status.ts`
- `test/unit/team-run.test.ts`
- `test/unit/runtime-resolver.test.ts`
**Acceptance criteria**
- `status` shows `Runtime safety: trusted|explicit_dry_run|blocked`.
- Blocked disabled-worker runs persist enough evidence to explain why no subagents spawned.
- Existing manifest schema remains backward compatible.
## P1 — Steering/Follow-up Semantics Beyond Live Control
### P1.1 Persist separate steering and follow-up queues in mailbox state
**Current state**
`follow-up-agent` exists in live-control, but durable mailbox is still generic inbox/outbox and `respond` still has waiting-task semantics.
**Target behavior**
- Mailbox messages can carry semantic kind:
```ts
kind?: "message" | "steer" | "follow-up" | "response" | "group_join";
priority?: "urgent" | "normal" | "low";
deliveryMode?: "interrupt" | "next_turn";
```
- `steer-agent` appends durable steering queue entry when no live session is present.
- `follow-up-agent` appends durable follow-up queue entry, deliverable after task stop/resume.
- UI/status separates urgent steering from follow-up backlog.
**Suggested files**
- `src/state/mailbox.ts`
- `src/runtime/live-agent-control.ts`
- `src/runtime/live-agent-manager.ts`
- `src/extension/team-tool/api.ts`
- `src/extension/team-tool/respond.ts`
- `src/ui/dashboard-panes/mailbox-pane.ts`
- `test/unit/mailbox-api.test.ts`
- `test/unit/live-agent-control.test.ts`
- `test/unit/respond-tool.test.ts`
**Acceptance criteria**
- Steering and follow-up can be inspected separately.
- Existing inbox/outbox JSONL remains readable.
- Durable queue survives process/session switch.
- Realtime live delivery dedupes against durable replay.
### P1.2 Clarify `respond` vs `follow-up` UX
**Problem**
`respond` is currently a waiting-task resume primitive. Users may expect it to send a general follow-up.
**Target behavior**
- `/team-respond` remains only for `waiting` tasks.
- `/team-follow-up` or `api operation=follow-up-agent` is documented as continuation prompt.
- Error messages recommend the correct command.
**Suggested files**
- `src/extension/registration/commands.ts`
- `src/extension/help.ts`
- `docs/usage.md`
- `test/unit/registration-commands-coverage.test.ts`
- `test/unit/respond-tool.test.ts`
## P1 — Worker Lifecycle and Process Reliability
### P1.3 Two-phase child process teardown
**Current state**
Child workers have improved post-exit stdio guards and bounded drains, but cancellation semantics can be made more deterministic.
**Target behavior**
Worker process cancellation returns structured status:
```ts
interface WorkerExitStatus {
exitCode: number | null;
cancelled: boolean;
timedOut: boolean;
killed: boolean;
signal?: string;
cleanupErrors: string[];
finalDrainMs: number;
}
```
Process lifecycle:
1. graceful cancel/TERM;
2. wait grace window;
3. hard kill process tree;
4. bounded stdout/stderr drain;
5. mark session non-reusable.
**Suggested files**
- `src/runtime/child-pi.ts`
- `src/runtime/pi-spawn.ts`
- `src/runtime/post-exit-stdio-guard.ts`
- `src/runtime/task-runner.ts`
- `src/runtime/cancellation.ts`
- `test/unit/child-pi*.test.ts`
- `test/integration/mock-child-run.test.ts`
**Acceptance criteria**
- Cancelled worker always produces terminal task event.
- Output drains are bounded.
- Status includes `cancelled/timedOut/killed`.
- No zombie/stale running task after cancellation.
### P1.4 Reserve worker control channel before spawn
**Problem**
There can be a short window where a task is logically starting but cancel/steer cannot target a controller yet.
**Target behavior**
- Synchronously create a `WorkerRunCore`/controller before async spawn.
- Persist controller metadata in agent status.
- Cancel/steer requests can be queued immediately while startup is in progress.
- Controller is cleared in `finally`.
**Suggested files**
- `src/runtime/task-runner.ts`
- `src/runtime/agent-control.ts`
- `src/runtime/live-agent-control.ts`
- `src/runtime/crew-agent-records.ts`
- `src/extension/team-tool/api.ts`
**Acceptance criteria**
- Starting worker can be cancelled immediately.
- Durable control request written during startup is applied or recorded as terminal no-op with reason.
- Tests simulate control request before child process emits first output.
## P1 — Cancellation and Attempt History
### P1.5 Add event-tree provenance: `parentEventId`, `attemptId`, `branchId`
**Current state**
Retry attempts have `attemptId`, and deadletters link to final attempt. Event log has sequence and terminal fingerprints but no general event tree.
**Target behavior**
- `TeamEvent.metadata` supports:
```ts
parentEventId?: string;
attemptId?: string;
branchId?: string;
causationId?: string;
correlationId?: string;
```
- Retry events, task started/completed/failed, deadletter, recovery events link by `attemptId`.
- UI/status can show attempt timeline.
**Suggested files**
- `src/state/event-log.ts`
- `src/state/types.ts`
- `src/runtime/team-runner.ts`
- `src/runtime/retry-executor.ts`
- `src/runtime/recovery-recipes.ts`
- `src/extension/team-tool/status.ts`
- `test/unit/event-metadata.test.ts`
- `test/unit/retry-executor.test.ts`
**Acceptance criteria**
- Retry attempt events and terminal task events share attempt provenance.
- Deadletter records can be traced back to event sequence.
- Existing JSONL readers ignore missing provenance fields.
### P1.6 Synthetic terminal results for cancelled in-flight operations
**Problem**
Run/task cancellation events are now structured, but worker/tool sub-operations can still lack synthetic terminal records if cancelled mid-operation.
**Target behavior**
- If a task started a worker/tool/model call and cancellation occurs, append a synthetic terminal record:
- `tool.cancelled` or `worker.cancelled`
- reason code/message
- startedAt/finishedAt
- attemptId if available
**Suggested files**
- `src/runtime/task-runner.ts`
- `src/runtime/task-runner/progress.ts`
- `src/runtime/child-pi.ts`
- `src/runtime/cancellation.ts`
- `src/state/contracts.ts`
- `test/unit/cancellation.test.ts`
**Acceptance criteria**
- No started tool/model operation is left without terminal evidence after cancellation.
- Status/diagnostics can distinguish user cancel vs timeout vs shutdown.
## P1 — Capability Inventory and Control Center
### P1.7 Build run/project capability inventory view
**Current state**
Per-task capability artifacts exist. There is no unified project/run inventory UI/API yet.
**Target behavior**
`/team-settings` or new `/team-control` shows normalized inventory:
```ts
interface CapabilityItem {
id: string;
kind: "team" | "workflow" | "agent" | "skill" | "tool" | "hook" | "runtime" | "provider";
name: string;
source: "builtin" | "project" | "user" | "runtime";
path?: string;
state: "active" | "disabled" | "shadowed" | "missing";
disabledReason?: string;
shadowedBy?: string;
}
```
**Suggested files**
- `src/extension/team-tool/handle-settings.ts`
- `src/extension/management.ts`
- `src/agents/discover-agents.ts`
- `src/teams/discover-teams.ts`
- `src/workflows/discover-workflows.ts`
- `src/runtime/skill-instructions.ts`
- `docs/resource-formats.md`
- `test/unit/management.test.ts`
**Acceptance criteria**
- Inventory is stable and sorted.
- Shadowed project/user/builtin resources are visible.
- Skill disabled/budget state is visible.
- No file path is used as the only stable ID.
### P1.8 Persist capability disables by stable ID
**Target behavior**
- Operator can disable a skill/agent/team by capability ID.
- Disable config survives path relocation when resource identity remains stable.
- Status explains disabled reason.
**Suggested files**
- `src/config/config.ts`
- `src/schema/config-schema.ts`
- discovery modules
- `test/unit/config-schema-validation.test.ts`
## P2 — Typed Hook Lifecycle
### P2.1 Introduce typed hook contract
**Target behavior**
Define typed lifecycle gates:
- `before_run_start`
- `before_task_start`
- `task_result`
- `before_cancel`
- `before_forget`
- `before_cleanup`
- `before_publish`
- `session_before_switch`
- `run_recovery`
Each hook declares:
```ts
type HookMode = "blocking" | "non_blocking";
type HookOutcome = "allow" | "block" | "modify" | "diagnostic";
```
Errors are recorded in diagnostics/events, not uncontrolled exceptions.
**Suggested files**
- new `src/hooks/*`
- `src/extension/register.ts`
- `src/runtime/team-runner.ts`
- `src/extension/team-tool/cancel.ts`
- `src/extension/team-tool/lifecycle-actions.ts`
- `docs/resource-formats.md`
- `test/unit/hooks*.test.ts`
**Acceptance criteria**
- Blocking hook can stop a run before worker start with clear event and status.
- Non-blocking hook failure records diagnostic and does not crash run.
- Hook context is redacted and bounded.
### P2.2 Require intent via policy/hook for destructive actions
**Current state**
Intent is optional for cancel/cleanup/forget/prune.
**Target behavior**
- Optional config:
```json
{
"policy": {
"requireIntentForDestructiveActions": true
}
}
```
- Actions requiring intent:
- cancel
- forget
- prune
- cleanup with force
- publish/release helpers if added
- worktree removal
**Acceptance criteria**
- Missing intent blocks action with actionable error.
- Existing tests can opt out or provide intent.
- Audit trail includes intent after approval.
## P2 — Durable History vs Prompt Projection
### P2.3 Separate durable run history projection from worker prompt text
**Current state**
Prompt pipeline artifacts exist, but context projection logic is still coupled to prompt construction in multiple places.
**Target behavior**
Introduce explicit projection functions:
```ts
transformRunContextBeforeWorkerStart(...)
convertRunHistoryToWorkerPrompt(...)
```
Rules:
- Durable history retains events, mailbox, artifacts, UI/runtime metadata.
- Worker prompt gets a bounded projection.
- UI/runtime events are not prompt text unless explicitly selected.
**Suggested files**
- `src/runtime/task-runner/prompt-pipeline.ts`
- `src/runtime/task-runner/prompt-builder.ts`
- `src/runtime/task-output-context.ts`
- `src/runtime/task-runner.ts`
- `test/unit/task-runner-prompt-pipeline.test.ts`
**Acceptance criteria**
- Prompt pipeline artifact identifies every projection source.
- Large event/mailbox history is summarized or referenced, not blindly embedded.
- Tests verify UI/runtime events are not injected as instructions.
## P2 — Cooperative Cancellation for Internal Scans
### P2.4 Add internal `CancellationToken`
**Target behavior**
A utility for long internal loops:
```ts
interface CancellationToken {
readonly aborted: boolean;
readonly reason?: CancellationReason;
heartbeat(stage?: string): void;
throwIfCancelled(): void;
wait(ms: number): Promise<void>;
}
```
Use it in:
- run index scans
- artifact cleanup
- mailbox validation/replay
- worktree cleanup
- diagnostic export
- large transcript/event reads
**Suggested files**
- new `src/runtime/cancellation-token.ts`
- `src/extension/run-index.ts`
- `src/extension/registration/artifact-cleanup.ts`
- `src/state/mailbox.ts`
- `src/ui/run-snapshot-cache.ts`
- `test/unit/cancellation-token.test.ts`
**Acceptance criteria**
- Long scan can abort within bounded cadence.
- Heartbeat stage appears in diagnostics/logs.
- Existing APIs can pass no token and keep current behavior.
## P2 — Artifact Store Improvements
### P2.5 Content-addressed blob artifacts
**Target behavior**
Large logs/transcripts/results are stored as blobs:
```text
artifacts/blobs/sha256/<hash>
artifacts/blob-metadata/<hash>.json
```
Metadata includes:
- runId/taskId
- MIME/type
- producer
- original path/name
- size/hash
- redaction status
- retention policy
**Suggested files**
- `src/state/artifact-store.ts`
- `src/runtime/task-runner.ts`
- `src/ui/transcript-viewer.ts`
- `src/extension/run-export.ts`
- `src/extension/run-import.ts`
- `test/unit/artifact-store*.test.ts`
**Acceptance criteria**
- Artifacts above threshold are blob-referenced.
- Run export/import preserves blobs.
- GC removes unreferenced blobs after retention.
- Path traversal protections remain intact.
## P2 — UI and Dashboard Upgrades
### P2.6 Show capability/effectiveness/cancellation panels in dashboard
**Target behavior**
Dashboard panes expose:
- run effectiveness score and no-observed-work tasks;
- cancellation reason and intent;
- capability inventory for selected task;
- attempt/deadletter timeline.
**Suggested files**
- `src/ui/run-dashboard.ts`
- `src/ui/dashboard-panes/*`
- `src/ui/snapshot-types.ts`
- `src/ui/run-snapshot-cache.ts`
- `test/unit/run-dashboard.test.ts`
- new pane tests
**Acceptance criteria**
- No heavy synchronous scans in render path.
- Pane output is width-safe.
- Snapshot cache provides precomputed compact data.
### P2.7 Event-first UI stream
**Target behavior**
Move more live UI updates from file polling to semantic events:
- `task_started`
- `task_completed`
- `worker_status`
- `mailbox_updated`
- `effectiveness_changed`
**Acceptance criteria**
- Render scheduler remains coalesced and overlap-safe.
- UI still recovers from durable files after restart.
- File polling is fallback, not the hot path.
## P2 — Raw Scan Entry Cache
### P2.8 Cache raw entries, not final semantic query results
**Target behavior**
Shared raw scan cache for:
- runs
- artifacts
- mailbox files
- transcript chunks
- worktree roots
Then apply filters/sorts after retrieval.
**Suggested files**
- `src/runtime/manifest-cache.ts`
- `src/ui/run-snapshot-cache.ts`
- `src/extension/run-index.ts`
- `src/utils/file-coalescer.ts`
**Acceptance criteria**
- Deterministic sort order.
- State mutation invalidates relevant raw entries.
- Large workspaces do not trigger full rescans on every render/status.
## P3 — Release/Install Hardening
### P3.1 Tarball install smoke before publish
**Target behavior**
Release workflow requires:
```bash
npm run ci
npm pack --dry-run
npm pack
# install tarball in temp project
# verify pi extension load smoke
# verify npm package files and version/tag consistency
```
**Suggested files**
- `docs/publishing.md`
- `package.json` scripts
- `.github/workflows/*` if CI is added
- optional `scripts/release-smoke.mjs`
**Acceptance criteria**
- Packed tarball loads extension in temp Pi home.
- Version in package, changelog, tag, npm view are consistent.
- Release instructions include rollback notes.
## Suggested Implementation Order
1. **P0.1 Effectiveness policy enforcement** — prevents misleading completed runs.
2. **P0.2 Persist runtime safety** — improves debugging for worker spawn issues.
3. **P1.3 Two-phase worker teardown** — reduces stale/zombie worker risk.
4. **P1.1 Durable steering/follow-up queues** — completes semantic split started at live-control level.
5. **P1.5 Event-tree provenance** — builds on current `attemptId` work.
6. **P1.7 Capability inventory view** — turns existing per-task artifacts into operator UX.
7. **P2.3 Durable history projection** — reduces prompt/context risks.
8. **P2.4 CancellationToken** — improves responsiveness of internal scans.
9. **P2.5 Blob artifacts** — prevents log/transcript bloat.
10. **P2.6 Dashboard panels** — surface all new evidence in UI.
## Release Guidance
Before publishing a patch with these upgrades:
```bash
npx tsc --noEmit
npm run test:unit
npm run test:integration
npm pack --dry-run
```
For runtime/process changes also run targeted child-worker integration tests:
```bash
node --experimental-strip-types --test --test-concurrency=1 --test-timeout=60000 \
test/integration/mock-child-run.test.ts \
test/integration/mock-child-json-run.test.ts \
test/integration/phase6-runtime-hardening.test.ts
```
Do not publish without explicit user confirmation and a green verification pass.

View File

@@ -0,0 +1,65 @@
# Publishing pi-crew
This package is published as the scoped public npm package:
```text
pi-crew
```
Before publishing to npm:
1. Confirm package metadata in `package.json`:
- `author`
- `repository`
- `homepage`
- `bugs`
- `publishConfig.access = public`
2. Confirm license and notices:
- keep `LICENSE`
- keep `NOTICE.md`
- document copied/adapted MIT source if any substantial code is ported
3. Run checks:
```bash
npm run check
```
4. Verify package contents:
```bash
npm pack --dry-run
```
5. Verify local install in Pi:
```bash
pi install ./pi-crew
/team-doctor
/team-validate
```
6. Publish when ready:
```bash
npm publish --access public
```
Users can install the published package with:
```bash
pi install npm:pi-crew
```
## Config schema
The package exports:
```text
./schema.json
```
Use this for editor validation of:
```text
~/.pi/agent/extensions/pi-crew/config.json
```

View File

@@ -0,0 +1,394 @@
# Phase 3 Refactor Plan — Port utilities & patterns from `source/`
> Xuất xứ: review sâu `source/pi-subagents` và `source/pi-mono/packages/coding-agent` (28/04/2026).
> Mục tiêu: port các utility/pattern còn thiếu/yếu trong pi-crew để tăng độ ổn định, quan sát, và bảo trì.
> Phase 2 (#17#25) đã hoàn tất, baseline: tsc 0 errors, 176 unit + 21 integration pass.
## Quy ước chung
- Không phá vỡ public API hiện tại. Mọi thay đổi nội bộ.
- Sau mỗi task: `npx tsc --noEmit` + `npm run test:unit` (+ `test:integration` nếu liên quan watcher/IO).
- Không thêm dependency runtime mới trừ khi task ghi rõ.
- Mỗi task = 1 commit độc lập có thể revert. Đặt tên test bám sát hành vi.
## Trạng thái cập nhật
- [x] Task #26`completion-dedupe` (đã hoàn tất)
- [x] Task #27`jsonl-writer` (đã hoàn tất)
- [x] Task #28`post-exit-stdio-guard` (đã hoàn tất)
- [x] Task #29`sleep` (đã hoàn tất)
- [x] Task #30`timings` (đã hoàn tất)
- [x] Task #31`fs-watch` (đã hoàn tất)
- [x] Task #32`result-watcher` (đã hoàn tất)
- [x] Task #33`parallel-utils` (đã hoàn tất)
- [x] Task #34`artifact-cleanup` (đã hoàn tất)
- [x] Task #35`team-doctor` (đã hoàn tất)
- [x] Task #37`hosted-git-info` cho team config git URL (đã hoàn tất)
- [ ] Task #36`proper-lockfile` (đã tạm hoãn, giữ `locks.ts` nội bộ)
---
## Batch A — Low-risk utility ports (ưu tiên cao)
Mục tiêu: 6 file mới + 2 file điều chỉnh. Risk thấp, tách rõ, dễ test riêng. Ước tính: 12h.
### Task #26 — Port `completion-dedupe.ts`
**Source**: `source/pi-subagents/completion-dedupe.ts`
**Đích**: `pi-crew/src/utils/completion-dedupe.ts`
**Lý do**: Pi-crew chưa có TTL seen-map. Khi `result-watcher`/mailbox được restart hoặc `primeExistingResults` chạy đồng thời với event mới, có thể double-emit. TTL map + key xây từ `(sessionId, agent, timestamp, taskIndex, totalTasks, success)` đảm bảo idempotent trong khoảng TTL.
**API export**:
```typescript
export function buildCompletionKey(data: CompletionDataLike, fallback: string): string;
export function pruneSeenMap(seen: Map<string, number>, now: number, ttlMs: number): void;
export function markSeenWithTtl(seen: Map<string, number>, key: string, now: number, ttlMs: number): boolean;
export function getGlobalSeenMap(storeKey: string): Map<string, number>;
```
**Acceptance**:
- File copy nguyên vẹn (chỉ điều chỉnh import paths nếu cần).
- Unit test `test/unit/completion-dedupe.test.ts`: cover 4 case
- `buildCompletionKey` với `id` ưu tiên cao nhất
- `buildCompletionKey` với meta fallback (no id)
- `markSeenWithTtl` trả về `true` lần thứ 2 trong TTL
- `pruneSeenMap` xoá entry expired
- Tích hợp: callsite mới sẽ làm trong Task #27.
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep completion-dedupe`
---
### Task #27 — Port `jsonl-writer.ts` + tích hợp event-log
**Source**: `source/pi-subagents/jsonl-writer.ts`
**Đích**: `pi-crew/src/state/jsonl-writer.ts`
**Lý do**: Pi-crew `events.jsonl` không có cap; run dài có thể grow vô hạn. JSONL writer của pi-subagents có:
- Backpressure (`source.pause()`/`resume()` khi `stream.write()` trả false)
- Max bytes hardcap (default 50MB) — drop silently sau threshold
- Best-effort error handling (try/catch quanh `createWriteStream`)
**Tích hợp**:
1. `event-log.ts` hiện tại append synchronous via `fs.appendFileSync`. Đổi sang `createJsonlWriter` sẽ phải async writes → cần xem xét impact với `appendEvent` callsites.
2. Phương án ít rủi ro: KHÔNG đổi `event-log.ts` đường nóng synchronous. Thay vào đó:
- Thêm size check trong `appendEvent`: trước khi append, `fs.statSync(eventsFile)` → nếu > `MAX_EVENTS_BYTES` (default 50MB) → log warning + drop.
- Hoặc rotation: rename `events.jsonl``events.jsonl.1` khi vượt threshold.
**API export**:
```typescript
export function createJsonlWriter(filePath: string | undefined, source: DrainableSource, deps?: JsonlWriterDeps): JsonlWriter;
```
**Acceptance**:
- File copy với điều chỉnh path imports.
- Unit test `test/unit/jsonl-writer.test.ts`: cover 4 case
- Writes line + newline
- Drops line khi vượt `maxBytes`
- Pause/resume source khi backpressure
- `close()` flush stream
- Tích hợp `event-log.ts`: thêm size guard (KHÔNG đổi sync→async). Nếu `events.jsonl` > `MAX_EVENTS_BYTES`, log internal-error + skip append (giữ nguyên runtime).
**Risk**: Thay đổi `event-log.ts` là đường nóng. Test integration `live-mailbox-flow` để đảm bảo không regress.
**Verification**: `npx tsc --noEmit` + `npm run test:unit` + `npm run test:integration`
---
### Task #28 — Tách `post-exit-stdio-guard` thành module riêng
**Source**: `source/pi-subagents/post-exit-stdio-guard.ts`
**Đích**: `pi-crew/src/runtime/post-exit-stdio-guard.ts`
**Lý do**: `child-pi.ts` hiện inline 60+ dòng quản lý timer post-exit. Tách module → tái dùng cho subagent + worker, dễ unit test.
**API export**:
```typescript
export function attachPostExitStdioGuard(
child: ChildWithPipedStdio,
options: { idleMs: number; hardMs: number },
): () => void;
export function trySignalChild(child: ChildWithKill, signal: NodeJS.Signals): boolean;
```
**Tích hợp**:
- Trong `child-pi.ts`:
- Thay block `postExitGuard = setTimeout(...)` + `child.stdout?.destroy()` bằng `attachPostExitStdioGuard(child, { idleMs: POST_EXIT_STDIO_GUARD_MS, hardMs: HARD_KILL_MS })`.
- Cleanup function được gọi trong `settle()`.
- Giữ logic `noResponseTimer` + `finalDrainTimer` riêng (chúng là khác semantics — pre-exit, không phải post-exit).
**Acceptance**:
- `runChildPi` test hiện có vẫn pass.
- Thêm unit test `test/unit/post-exit-stdio-guard.test.ts`: simulate child exit + dangling stdout → verify destroy gọi sau idleMs.
- Behaviour: khi child không exit nhưng stdio idle → KHÔNG destroy (chỉ destroy sau exit).
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep child-pi` + `npm run test:unit -- --grep post-exit`
---
### Task #29 — Port `utils/sleep.ts`
**Source**: `source/pi-mono/packages/coding-agent/src/utils/sleep.ts`
**Đích**: `pi-crew/src/utils/sleep.ts`
**Lý do**: Abortable sleep helper. Hữu ích cho retry/backoff trong `model-fallback.ts`, `task-runner.ts`, `subagent-manager.ts` (`scheduleStuckBlockedNotify`).
**API export**:
```typescript
export function sleep(ms: number, signal?: AbortSignal): Promise<void>;
```
**Tích hợp** (không bắt buộc lần đầu, chỉ port file):
- Quét `setTimeout(...{}, ms)` patterns trong `model-fallback.ts` để đánh giá có thay không. Mặc định KHÔNG đổi callsite trong task này — file utility độc lập.
**Acceptance**:
- File copy nguyên vẹn.
- Unit test `test/unit/sleep.test.ts`: 3 case
- Resolve sau ms
- Reject ngay nếu signal đã abort
- Reject khi abort trong lúc đợi + clear timeout
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep sleep`
---
### Task #30 — Port `core/timings.ts` (PI_TIMING profiler)
**Source**: `source/pi-mono/packages/coding-agent/src/core/timings.ts`
**Đích**: `pi-crew/src/utils/timings.ts`
**Lý do**: Pi-crew register nhiều slash command/widget/extension hooks. Khi user báo "khởi động chậm", hiện tại không có cách nhanh để đo. `PI_TIMING=1` env → in breakdown từng giai đoạn.
**API export**:
```typescript
export function resetTimings(): void;
export function time(label: string): void;
export function printTimings(): void;
```
**Tích hợp**:
- Trong `index.ts` / `src/extension/register.ts`:
- Đầu file: `import { time, printTimings, resetTimings } from "./utils/timings.js"`.
- Sau từng bước register lớn (load config, register tools, register slash commands, register widgets, init runtime resolver): `time("step-name")`.
- Cuối: gọi `printTimings()` (no-op nếu không bật env).
**Acceptance**:
- File copy nguyên vẹn.
- Unit test minimal: gọi `time` + `printTimings` không throw.
- Smoke: `PI_TIMING=1 node --experimental-strip-types -e "import('./pi-crew/index.ts')"` in ra `--- Startup Timings ---`.
**Verification**: `npx tsc --noEmit` + manual smoke với `PI_TIMING=1`.
---
### Task #31 — Port `utils/fs-watch.ts`
**Source**: `source/pi-mono/packages/coding-agent/src/utils/fs-watch.ts`
**Đích**: `pi-crew/src/utils/fs-watch.ts`
**Lý do**: Wrapper an toàn cho `fs.watch` với:
- `closeWatcher(watcher)`: nuốt error khi close
- `watchWithErrorHandler(path, listener, onError)`: try/catch quanh `watch()`, tự gọi `onError` nếu throw, attach `error` listener
**API export**:
```typescript
export const FS_WATCH_RETRY_DELAY_MS: number;
export function closeWatcher(watcher: FSWatcher | null | undefined): void;
export function watchWithErrorHandler(path: string, listener: WatchListener<string>, onError: () => void): FSWatcher | null;
```
**Tích hợp** (không bắt buộc lần đầu, chỉ port file):
- Khi viết `result-watcher` (Task #32 Tier 2), dùng wrapper này.
**Acceptance**:
- File copy.
- Unit test `test/unit/fs-watch.test.ts`: 2 case
- `closeWatcher(null)` không throw
- `watchWithErrorHandler` gọi `onError` khi `watch()` throw (mock fs)
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep fs-watch`
---
## Batch B — Pattern lớn hơn, cần thiết kế
Mục tiêu: 3 task có thiết kế. Risk trung bình. Ước tính: 34h.
### Task #32 — Result watcher auto-restart pattern
**Source**: `source/pi-subagents/result-watcher.ts`
**Đích**: `pi-crew/src/runtime/result-watcher.ts` (mới) HOẶC tích hợp vào mailbox/event-log nếu phù hợp.
**Lý do**: Khi `fs.watch` báo error (filesystem bị unmount, network drive disconnect), pi-crew hiện không tự khôi phục. Pattern: bắt error → setTimeout 3s → mkdir + start lại watcher.
**Phụ thuộc**: Task #31 (fs-watch), Task #26 (completion-dedupe).
**API export**:
```typescript
export function createResultWatcher(input: {
resultsDir: string;
onResult: (file: string) => Promise<void>;
state: ResultWatcherState;
completionTtlMs: number;
}): {
start: () => void;
primeExisting: () => void;
stop: () => void;
};
```
**Acceptance**:
- Unit test:
- Watcher emits scheduled file → `onResult` được gọi.
- Watcher error → 3s sau tự restart (dùng fake timers).
- Dedupe: 2 events cùng file trong TTL → `onResult` chỉ gọi 1 lần.
- Integration test với fixture `tmp/results/`: write file → onResult chạy → file unlink.
**Risk**: Pi-crew có thể chưa có "result file producer" pattern (results đang qua mailbox in-process). Đánh giá: nếu KHÔNG có async result file pattern, **bỏ qua task này**.
**Verification**: `npm run test:unit` + `npm run test:integration`
---
### Task #33 — Port `parallel-utils` (mapConcurrent + aggregateParallelOutputs)
**Source**: `source/pi-subagents/parallel-utils.ts`
**Đích**: `pi-crew/src/runtime/parallel-utils.ts`
**Lý do**:
- `concurrency.ts` chỉ tính toán số concurrent, không có helper map.
- `parallel-research.ts` hiện viết riêng worker pool. Có thể đơn giản hoá.
- `aggregateParallelOutputs` chuẩn hoá format kết quả (FAILED/SKIPPED/EMPTY OUTPUT) — pi-crew có thể tận dụng cho task summary.
**API export**:
```typescript
export async function mapConcurrent<T, R>(items: T[], limit: number, fn: (item: T, i: number) => Promise<R>): Promise<R[]>;
export interface ParallelTaskResult { agent: string; taskIndex?: number; output: string; exitCode: number | null; error?: string; ... }
export function aggregateParallelOutputs(results: ParallelTaskResult[], headerFormat?: ...): string;
export const MAX_PARALLEL_CONCURRENCY: number;
```
**Tích hợp**:
- Refactor `parallel-research.ts` dùng `mapConcurrent` (giữ behaviour).
- Xét dùng trong `task-graph-scheduler.ts` cho batches ready tasks.
**Acceptance**:
- Unit test `test/unit/parallel-utils.test.ts`:
- `mapConcurrent` tôn trọng limit (counter pending max).
- `mapConcurrent([], 4, fn)` trả `[]`, không gọi fn.
- `mapConcurrent` propagate exception.
- `aggregateParallelOutputs` format đúng cho 4 case (success/failed/skipped/empty).
**Verification**: `npm run test:unit -- --grep parallel-utils`
---
### Task #34 — Artifact cleanup với daily marker
**Source**: `source/pi-subagents/artifacts.ts` (hàm `cleanupOldArtifacts`)
**Đích**: bổ sung vào `pi-crew/src/state/artifact-store.ts`
**Lý do**: Pi-crew `<crewRoot>/state/artifacts/` (`<crewRoot>` = `.crew/` mới hoặc `.pi/teams/` legacy) không có TTL → run cũ tích lũy mãi. Pattern subagents:
- File `.last-cleanup` chứa timestamp.
- Nếu marker mới hơn 24h → skip (không scan dir lớn mỗi extension load).
- Nếu cần scan: xoá file mtime > `maxAgeDays * 24h`.
**API mới trong artifact-store.ts**:
```typescript
export function cleanupOldArtifacts(artifactsRoot: string, maxAgeDays: number): void;
```
**Tích hợp**:
- Gọi 1 lần khi extension activate, sau khi resolve `artifactsRoot`.
- Default: `maxAgeDays = 7` (config qua `defaults.ts`).
- Xét cleanup `events.jsonl` cũ tương tự (có rotation pattern Task #27).
**Acceptance**:
- Unit test `test/unit/artifact-cleanup.test.ts`:
- Tạo files với mtime cũ + mới → cleanup chỉ xoá cũ.
- Marker mới (< 24h) → skip cleanup.
- Marker cũ (> 24h) → scan + update marker.
- Dir không tồn tại → no-op.
- Tích hợp test (optional): activate extension 2 lần liên tiếp → lần 2 không scan.
**Verification**: `npm run test:unit -- --grep artifact-cleanup`
---
### Task #35 — Build `team doctor` action
**Source**: `source/pi-subagents/doctor.ts`
**Đích**: `pi-crew/src/extension/team-tool/doctor.ts` (mới) + register trong team-tool.
**Lý do**: Pi-crew thiếu lệnh diagnostic 1-liên-1. Format report của subagents có cấu trúc:
- Runtime (cwd, async, session)
- Filesystem (state/artifacts/runs dirs)
- Discovery (agents, teams, workflows count theo source)
- Configuration validation status
- Optional: intercom/extension status
**API**:
```typescript
export function buildTeamDoctorReport(input: {
cwd: string;
config: ResolvedConfig;
...
}): string;
```
**Tích hợp**:
- Thêm action `doctor` trong `team-tool` action handler.
- Slash command `/team-doctor` (nếu phù hợp với UX).
**Acceptance**:
- Unit test:
- Report có heading đúng.
- Filesystem section hiển thị "ok" cho dir tồn tại, "missing" cho không.
- Discovery counts khớp với fixture builtin/user/project.
- Khi exception trong section → in `failed — <error>` thay vì throw.
- Manual: chạy `team` action `doctor` → verify output text.
**Verification**: `npm run test:unit -- --grep doctor`
---
## Tier 3 — Library swaps (cân nhắc, không bắt buộc Phase 3)
### Task #36 (optional) — Đánh giá `proper-lockfile`
**Bối cảnh**: `source/pi-mono/packages/coding-agent/package.json` đã dùng `proper-lockfile`. Pi-crew tự viết `locks.ts` với O_EXCL + retry.
**Quyết định**:
- Nếu phát hiện flake/race trong `npm run test:integration` (đặc biệt `locks-race.test.ts`) → adopt.
- Nếu hiện tại pass ổn định → giữ `locks.ts` để zero-dep.
**Action nếu adopt**:
1. `npm install proper-lockfile @types/proper-lockfile`.
2. Replace `locks.ts` `acquireLock`/`releaseLock` bằng `lockfile.lock(filePath, { retries: ..., stale: ... })`.
3. Re-run `locks-race.test.ts` 100 iterations để xác nhận no regress.
**Verification**: full CI.
---
### Task #37 (optional) — `hosted-git-info` cho team config git URL
**Bối cảnh**: Khi pi-crew hỗ trợ `team: git+https://github.com/org/teams-repo` → dùng `parseGitUrl` của coding-agent.
**Trạng thái**: Đã triển khai cho runtime discover/validate: `ResourceSource` mở rộng thành `git`, `TeamConfig.sourceUrl` được ghi, parser `parseGitUrl` đã chuẩn hóa `git+` và hỗ trợ `#` ref.
---
## Tracking template (sao chép vào commit message)
```
Phase 3 #NN — <short title>
Source: source/pi-subagents/<file>.ts (or pi-mono/...)
Target: pi-crew/src/<dir>/<file>.ts
Risk: low | medium | high
Tests added: test/unit/<file>.test.ts
Verification: tsc --noEmit OK; test:unit OK; test:integration <OK|N/A>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
```
---
## Thứ tự gợi ý thực hiện
1. **Tuần 1 — Batch A (low-risk)**: #29#30#31#26#28#27
- Bắt đầu bằng `sleep`/`timings`/`fs-watch` (đơn lẻ, no callsite change).
- Tiếp `completion-dedupe` (file độc lập).
- Cuối `post-exit-stdio-guard` (chỉnh `child-pi.ts`) và `jsonl-writer` (chỉnh `event-log.ts`).
2. **Tuần 2 — Batch B (mid-risk)**: #33#34#35 → (#32 nếu áp dụng).
3. **Tuần 3 — Tier 3 nếu cần**: #36/#37 only on demand.
Toàn bộ Phase 3 ước tính 46h focus work, không thêm runtime dep ngoại trừ tuỳ chọn `proper-lockfile`.

View File

@@ -0,0 +1,564 @@
# Phase 4 Refactor Plan — UI/Theme/Performance từ pi-mono coding-agent
> Xuất xứ: review sâu `source/pi-mono/packages/coding-agent` + `source/pi-mono/packages/tui` (28/04/2026), so sánh với `pi-crew/src/ui/` hiện tại.
> Mục tiêu: tăng hiệu năng render, dọn duplicate code, type-safe theme integration, port các UI component thiếu (diff/loader/visual-truncate/syntax highlight).
> Phase 3 (#26#37) đã hoàn tất, baseline: tsc 0 errors, 213 unit + 21 integration pass, commit `6f64c31`.
## Quy ước chung
- Không phá vỡ public API (slash commands, tool actions, config schema). Mọi thay đổi nội bộ.
- Sau mỗi task: `npx tsc --noEmit` + `npm run test:unit` (+ `test:integration` nếu liên quan render/layout).
- Không thêm dependency runtime mới trừ khi task ghi rõ (chấp nhận `diff` cho Task #45 nếu chưa có).
- Mỗi task = 1 commit độc lập có thể revert. Đặt tên test bám sát hành vi.
- `theme` parameter đang là `unknown` — không được break `ctx.ui.custom((tui, theme, ...) => Component)` signature do pi-coding-agent dictate.
## Trạng thái cập nhật
- [x] Task #38`utils/visual.ts` dedupe truncate/visibleWidth
- [x] Task #39 — Render cache cho widget/sidebar
- [x] Task #40 — File-coalescer apply vào readers UI
- [x] Task #41 — Manifest cache với mtime invalidation
- [x] Task #42 — Type-safe theme adapter
- [x] Task #43 — Status palette helpers
- [x] Task #44 — Refactor widgets sang pi-tui Container/Box/Text
- [x] Task #45 — Port `renderDiff` (word-level intra-line)
- [x] Task #46 — Port `BorderedLoader` + `CountdownTimer`
- [x] Task #47 — Port `truncateToVisualLines` cho transcript
- [x] Task #48 — Syntax highlight cho transcript JSONL
- [x] Task #49 (optional) — Animated mascot easter egg
---
## Tier 1 — Performance (high ROI, low risk)
Mục tiêu: 4 task, dedupe + cache + I/O coalescing. Risk thấp, không đổi API. Ước tính: 12 ngày.
### Task #38 — Dedupe truncate/visibleWidth → `src/utils/visual.ts`
**Source**: `@mariozechner/pi-tui` (đã ship `visibleWidth`, `truncateToWidth`); pi-mono `components/visual-truncate.ts`
**Đích**: `pi-crew/src/utils/visual.ts`
**Lý do**: 4 file UI (`run-dashboard.ts`, `crew-widget.ts`, `live-run-sidebar.ts`, `transcript-viewer.ts`) mỗi file có bản copy của:
- `ANSI_PATTERN = /\u001b\[[0-?]*[ -/]*[@-~]/g`
- `visibleWidth(value)` / `visibleLength(value)`
- `truncate(value, width)` (logic không hoàn toàn nhất quán giữa các bản)
- `pad(value, width)` / `padVisible`
→ Lặp lại ~80 dòng × 4 file. Dễ xảy ra drift bug.
**API export**:
```typescript
export const ANSI_PATTERN: RegExp;
export function visibleWidth(value: string): number;
export function truncate(value: string, width: number, ellipsis?: string): string;
export function pad(value: string, width: number): string;
export function wrapHard(value: string, width: number): string[];
export function boxLine(text: string, innerWidth: number): string; // "│ {pad/truncate} │"
```
**Tích hợp**:
- Re-export `visibleWidth` + `truncateToWidth` từ `@mariozechner/pi-tui` nếu có (kiểm tra `tui/utils.ts`).
- 4 file UI thay `import { ... }` từ local helper → `from "../utils/visual.ts"`.
- Xoá local helpers đã chuyển.
**Acceptance**:
- File mới + xoá ~80 LOC × 4 file (~320 LOC giảm).
- Unit test `test/unit/visual.test.ts`: 6 case
- `visibleWidth("\u001b[31mhello\u001b[0m")` = 5
- `truncate("hello world", 5)` = "hell…"
- `truncate(value, 0)` = ""
- `truncate(value, 1)` = "…"
- `pad("ab", 5)` = "ab "
- `wrapHard("abcdefgh", 3)` = ["abc","def","gh"]
- Snapshot test (optional): render `crew-widget` trước/sau giống bit-by-bit.
**Risk**: Thấp. Behavior tương đương, chỉ tách module.
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep visual` + `npm run test:unit -- --grep widget` (smoke).
---
### Task #39 — Render cache cho widget/sidebar (cachedWidth + version)
**Source pattern**: `pi-mono/packages/coding-agent/src/modes/interactive/components/armin.ts` (cachedWidth + cachedVersion + invalidate)
**Đích**: `crew-widget.ts`, `live-run-sidebar.ts`, `run-dashboard.ts`
**Lý do**: Mỗi tick (`widgetDefaultFrameMs`, `dashboardLiveRefreshMs` = 100ms) toàn bộ box được rebuild dù dữ liệu chưa đổi và terminal width chưa đổi. Khi data nhiều agent (>10), render cost không trivial.
**API pattern (per component)**:
```typescript
class CrewWidgetComponent {
private cachedWidth = 0;
private cachedVersion = -1;
private currentVersion = 0;
private cachedLines: string[] = [];
invalidate(): void {
this.cachedWidth = 0; // forces rerender on next render() call
}
private dataSignature(): number {
// Hash from runs.length + agents counts + max updatedAt + statuses
// Bump currentVersion when signature differs from last computed
}
render(width: number): string[] {
const sig = this.dataSignature();
if (width === this.cachedWidth && this.cachedVersion === sig) return this.cachedLines;
// ... build lines ...
this.cachedWidth = width;
this.cachedVersion = sig;
return this.cachedLines;
}
}
```
**Tích hợp**:
- `CrewWidgetComponent.render()`: dataSignature từ `frame % spinnerLength` + run/agent hash.
- Lưu ý spinner thay đổi mỗi tick → vẫn rerender header chứa spinner. Tách `staticBody` (cached) khỏi `spinnerLine` (live).
- `LiveRunSidebar.render()`: dataSignature từ manifest.updatedAt + agents.length + tasks.length + active counts.
- `RunDashboard.render()`: dataSignature từ runs.length + selected index + showFullProgress flag.
**Acceptance**:
- Unit test `test/unit/render-cache.test.ts`:
- `render(80)` 2 lần liên tiếp với data không đổi → tham chiếu mảng giống nhau (re-use cached).
- `render(80)` sau khi `invalidate()` → mảng mới.
- `render(120)` sau `render(80)` → mảng mới (width đổi).
- Manifest mtime đổi → signature đổi → mảng mới.
- Microbenchmark (`scripts/bench-render.ts` mới):
- Trước: `LiveRunSidebar.render(80) × 1000` ≥ 150ms
- Sau: `≤ 50ms` (cache hit ratio > 90%)
**Risk**: Trung bình. Nếu dataSignature không bắt được mọi mutation → stale UI. Mitigation: include `Date.now() / 1000 | 0` trong sig cho live components để rerender 1Hz tối thiểu.
**Verification**: `npx tsc --noEmit` + `npm run test:unit` + bench.
---
### Task #40 — File coalescer apply vào readers UI
**Source pattern**: `pi-crew/src/utils/file-coalescer.ts` (đã có từ Phase 2)
**Đích**: `crew-widget.ts`, `live-run-sidebar.ts`, `run-dashboard.ts`, `powerbar-publisher.ts`
**Lý do**: Mỗi tick render gọi:
- `readCrewAgents(manifest)``fs.readFileSync(agents.json)` parse JSON
- `readTasks(tasksPath)``fs.readFileSync(tasks.json)` parse JSON
Khi 4 widget cùng tick (widget + sidebar + powerbar + dashboard nếu mở) → cùng file đọc 4 lần trong < 10ms.
**Tích hợp**:
- Bọc `readCrewAgents` + `readTasks` qua `coalesceReads(filePath, ttlMs=200)` cache.
- Tránh stale: invalidate khi chính pi-crew write (set marker timestamp).
- Pattern:
```typescript
// crew-agent-records.ts
import { coalesceReads } from "../utils/file-coalescer.ts";
const COALESCE_TTL = 200;
export function readCrewAgents(manifest: TeamRunManifest): CrewAgentRecord[] {
return coalesceReads(manifest.agentsPath, COALESCE_TTL, () => parseAgentsFile(manifest.agentsPath));
}
```
**Acceptance**:
- Unit test `test/unit/agents-coalesce.test.ts`:
- Spy `fs.readFileSync` → 5 calls trong 100ms cho cùng path → chỉ đọc 1 lần.
- Sau TTL → đọc lại.
- Integration test: tick widget 10 lần trong 500ms → đọc agents.json tối đa 3 lần.
**Risk**: Thấp. TTL ngắn (200ms) đảm bảo data fresh.
**Verification**: `npm run test:unit -- --grep coalesce`.
---
### Task #41 — Manifest cache với mtime invalidation
**Source pattern**: `pi-mono/packages/coding-agent/src/core/footer-data-provider.ts` (cached branch + watch + debounce 500ms)
**Đích**: `pi-crew/src/runtime/manifest-cache.ts` (mới)
**Lý do**: `loadRunManifestById` đọc `manifest.json` + parse. `LiveRunSidebar` gọi mỗi tick (10Hz). Tương tự `listRecentRuns` scan cả thư mục `runs/`.
**API export**:
```typescript
export interface ManifestCache {
get(runId: string): TeamRunManifest | undefined;
list(limit: number): TeamRunManifest[];
invalidate(runId?: string): void;
dispose(): void;
}
export function createManifestCache(cwd: string, options?: { debounceMs?: number; watch?: boolean }): ManifestCache;
```
**Implementation**:
- Cache Map<runId, { manifest, mtimeMs }>.
- `get(runId)`: stat manifest path; nếu mtime khớp cache → return cached.
- `list(limit)`: scan dir, return top N theo mtime; cache toàn bộ list 500ms.
- Watcher (optional): `watchWithErrorHandler(runsDir)` + debounce 500ms → invalidate.
**Tích hợp**:
- `register.ts` tạo 1 instance ManifestCache khi `session_start`, dispose ở `session_shutdown`.
- `LiveRunSidebar`, `RunDashboard`, `crew-widget`, `powerbar-publisher` nhận cache (qua context closure).
**Acceptance**:
- Unit test:
- 5 calls `get(runId)` trong 100ms với mtime không đổi → 1 lần stat + 1 lần read.
- Sau write manifest (mtime đổi) → cache invalidate, đọc lại.
- `list(10)` cache 500ms.
- `dispose()` close watchers.
- Integration test: simulate 1Hz manifest update + 10Hz render → render dùng cached value, không đọc lại trừ khi manifest thực sự đổi.
**Risk**: Trung bình. Watch on Windows có quirks (đã giảm bằng Phase 3 fs-watch wrapper).
**Verification**: `npm run test:unit -- --grep manifest-cache` + `npm run test:integration`.
---
## Tier 2 — Theme Integration (clean API, type-safe)
Mục tiêu: 3 task, type-safe theme + reuse pi-tui layout primitives. Risk trung bình. Ước tính: 12 ngày.
### Task #42 — Type-safe theme adapter `src/ui/theme-adapter.ts`
**Source pattern**: `pi-mono/packages/coding-agent/src/modes/interactive/theme/theme.ts` (Theme class với fg/bg/bold/italic)
**Đích**: `pi-crew/src/ui/theme-adapter.ts`
**Lý do**: Hiện tại 5 file UI cast `theme as unknown as { fg?: ... }`. IDE không suggest color names, dễ typo (`accenT` không lỗi compile).
**API export**:
```typescript
export type CrewThemeColor =
| "accent" | "border" | "borderAccent" | "borderMuted"
| "success" | "error" | "warning"
| "muted" | "dim" | "text"
| "toolDiffAdded" | "toolDiffRemoved" | "toolDiffContext"
| "syntaxKeyword" | "syntaxString" | "syntaxNumber" | "syntaxComment" | "syntaxFunction" | "syntaxVariable" | "syntaxType";
export type CrewThemeBg = "selectedBg" | "userMessageBg" | "toolPendingBg" | "toolSuccessBg" | "toolErrorBg";
export interface CrewTheme {
fg(color: CrewThemeColor, text: string): string;
bg?(color: CrewThemeBg, text: string): string;
bold(text: string): string;
italic?(text: string): string;
underline?(text: string): string;
inverse?(text: string): string;
}
export function asCrewTheme(raw: unknown): CrewTheme;
```
**Implementation**:
- `asCrewTheme`: validate raw có method `fg`/`bold`. Nếu thiếu → fallback no-op `(c, t) => t`.
- Sub-set của pi-coding-agent Theme class — không trùng namespace `CrewThemeColor` nhưng align values.
**Tích hợp**:
- `crew-widget.ts`, `live-run-sidebar.ts`, `run-dashboard.ts`, `transcript-viewer.ts`:
- Replace `theme.fg?.bind(theme) ?? ((_color, text) => text)` bằng `const t = asCrewTheme(rawTheme); t.fg("accent", x)`.
- Param signature: `(theme: unknown)` đổi thành `(theme: CrewTheme | unknown)`.
**Acceptance**:
- Unit test `test/unit/theme-adapter.test.ts`:
- `asCrewTheme(undefined)` → no-op fallback.
- `asCrewTheme({})` → no-op.
- `asCrewTheme({ fg: ..., bold: ... })` → uses provided methods.
- Type test (compile-only): `t.fg("nonExistent", "x")` produces TS error.
- Lint pass; tsc 0 errors sau khi thay 5 file.
**Risk**: Thấp. Fallback an toàn cho host không cung cấp đủ method.
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep theme-adapter`.
---
### Task #43 — Status palette helpers `src/ui/status-colors.ts`
**Source pattern**: `pi-mono` highlight pattern + pi-crew current ad-hoc switch-case
**Đích**: `pi-crew/src/ui/status-colors.ts`
**Lý do**: 5 file (`run-dashboard:65-72`, `crew-widget:89-95`, `live-run-sidebar:35`, `transcript-viewer`, `powerbar-publisher`) mỗi nơi có `switch(status){...}` mapping → màu/icon. Hiện không nhất quán (vd `crew-widget` ưu tiên `runningGlyph`, `run-dashboard` không).
**API export**:
```typescript
export type RunStatus = "queued" | "running" | "completed" | "failed" | "cancelled" | "blocked" | "stale" | "stopped" | (string & {});
export function colorForStatus(status: RunStatus): CrewThemeColor;
export function iconForStatus(status: RunStatus, options?: { runningGlyph?: string }): string;
export function colorForActivity(activityState: string | undefined): CrewThemeColor;
export function applyStatusColor(theme: CrewTheme, status: RunStatus, text: string): string;
```
**Implementation**:
- `colorForStatus`: `completed→success`, `failed|stale|error→error`, `cancelled|blocked|stopped→warning`, `running→accent`, `queued→muted`, default→dim.
- `iconForStatus`: `completed→✓`, `failed/stale→✗`, `cancelled/stopped→■`, `running→runningGlyph || ▶`, `queued→◦`, `blocked→⏸`, default→·.
**Tích hợp**:
- 5 file UI thay switch-case bằng 1 dòng `colorForStatus(status)`.
- `crew-widget.colorWidgetLine` regex map icon → dùng `iconForStatus` direct.
**Acceptance**:
- Unit test `test/unit/status-colors.test.ts`: 8 case theo từng status + edge case unknown status.
- Snapshot widget/dashboard render không thay đổi (test regression).
**Risk**: Thấp. Pure mapping function.
**Verification**: `npm run test:unit -- --grep status-colors`.
---
### Task #44 — Refactor widgets dùng pi-tui Container/Box/Text
**Source pattern**: `pi-mono/packages/tui/src/components/box.ts`, `text.ts`, plus `pi-mono/components/footer.ts` để tham chiếu cách compose.
**Đích**: `live-run-sidebar.ts`, `run-dashboard.ts` (giảm độ phức tạp)
**Lý do**: 2 file đang vẽ box bằng string concatenation `╭─╮│├┤╰╯` thủ công, mỗi line gọi `pad(truncate(...))`. Dễ vỡ khi terminal resize. pi-tui đã có `Container` + `Box` (rounded border tự động) + `DynamicBorder` từ pi-coding-agent.
**Tích hợp**:
- `LiveRunSidebar` → extend `Container`:
```typescript
class LiveRunSidebar extends Container {
constructor(input) {
super();
this.addChild(new DynamicBorder(c => theme.fg("border", c)));
this.addChild(new Text(theme.bold("pi-crew live sidebar"), 1, 0));
// ...
}
render(width: number): string[] { /* parent handles layout */ }
}
```
- `RunDashboard` tương tự — sections dùng `Spacer(1)` + `Text`.
- Lưu ý: `ctx.ui.custom((tui, theme, keys, done) => Component)` — trả về `Container` instance vẫn OK vì `Container` implements `Component`.
**Acceptance**:
- LOC giảm ≥ 30% cho 2 file.
- Visual snapshot test: render 80 + 120 width, content đồng nhất với baseline (allow whitespace diff).
- handleInput logic giữ nguyên semantics (q/esc/j/k/p/r/s/u/a/i/d/e/o/v).
**Risk**: Trung bình. Nếu Container layout không match cách hiện tại render padding thì box edge dịch chuyển. Mitigation: viết test snapshot trước khi refactor.
**Verification**: `npx tsc --noEmit` + `npm run test:unit` + manual `team-dashboard` smoke.
---
## Tier 3 — UI Components mới
Mục tiêu: 4 task, port các utility UI thiếu. Risk trung-cao. Ước tính: 23 ngày.
### Task #45 — Port `renderDiff` (word-level intra-line)
**Source**: `pi-mono/packages/coding-agent/src/modes/interactive/components/diff.ts`
**Đích**: `pi-crew/src/ui/render-diff.ts`
**Lý do**: pi-crew có agents `code-modify`, `reviewer`, `verifier` thường tạo diff artifacts. Hiện tại transcript viewer + result viewer chỉ in raw text. `renderDiff` cho phép:
- Removed line: red với inverse trên token thay đổi.
- Added line: green với inverse trên token thay đổi.
- Context: dim/gray.
**Dependency check**: package `diff` (npm). Verify `pi-crew/package.json` chưa có → nếu thêm: `npm i diff @types/diff`.
**API export**:
```typescript
export interface RenderDiffOptions { filePath?: string }
export function renderDiff(diffText: string, theme: CrewTheme, options?: RenderDiffOptions): string;
```
**Implementation**: Copy `pi-mono/diff.ts` + thay `theme.inverse` import từ adapter; replace `theme.fg("toolDiff*", ...)` (đã thêm vào `CrewThemeColor` Task #42).
**Tích hợp**:
- `transcript-viewer.ts`: detect `[Tool: edit]` blocks chứa unified diff format → call `renderDiff`.
- Slash command `/team-diff <runId> <taskId>` (optional Task #45.b): render artifact diff trực tiếp.
**Acceptance**:
- Unit test `test/unit/render-diff.test.ts`:
- Single line modification → intra-line word diff with inverse.
- Multi line block → no intra-line, just full-line color.
- Context line preserved.
- Empty diff → empty string.
- Manual: render fixture `before.ts` vs `after.ts` diff trong overlay.
**Risk**: Trung bình. Add deps `diff` (~30KB). Acceptable.
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep render-diff`.
---
### Task #46 — Port `BorderedLoader` + `CountdownTimer`
**Source**: `pi-mono/packages/coding-agent/src/modes/interactive/components/bordered-loader.ts` + `countdown-timer.ts`
**Đích**: `pi-crew/src/ui/loaders.ts`
**Lý do**:
- `team run` async start có thể mất 25s spawn child. Hiện không feedback UI.
- `team cancel runId=...` force-kill nhưng không hiển thị countdown trước SIGKILL.
- `team-doctor` chạy 13s I/O không có loader.
**API export**:
```typescript
export interface CrewBorderedLoaderOptions {
cancellable?: boolean;
message: string;
}
export class CrewBorderedLoader extends Container {
constructor(tui: TUI, theme: CrewTheme, options: CrewBorderedLoaderOptions);
get signal(): AbortSignal;
set onAbort(fn: (() => void) | undefined);
dispose(): void;
}
export interface CountdownTimerOptions {
timeoutMs: number;
onTick: (seconds: number) => void;
onExpire: () => void;
tui?: TUI;
}
export class CountdownTimer {
constructor(options: CountdownTimerOptions);
dispose(): void;
}
```
**Implementation**: Copy code from pi-mono, thay theme reference qua adapter. Lưu ý `CancellableLoader`/`Loader` được pi-tui export — verify trước khi import.
**Tích hợp** (per use case, có thể commit riêng):
- `team-tool/run.ts`: trước khi spawn, hiển thị `CrewBorderedLoader` với message "spawning crew agents...". Khi run started, dispose loader + open sidebar.
- `team-tool/cancel.ts`: tạo `CountdownTimer({ timeoutMs: 5000, onTick: s => loader.setMessage(`cancelling in ${s}s, press y to skip`) })`.
**Acceptance**:
- Unit test `test/unit/loaders.test.ts`:
- `CrewBorderedLoader.signal.aborted` = false ban đầu, true sau khi user trigger Esc.
- `dispose()` clear interval + remove listeners.
- `CountdownTimer` tick → onTick gọi với seconds giảm dần.
- `CountdownTimer` expire sau timeoutMs → onExpire gọi 1 lần.
- Manual smoke trong `team-run` overlay.
**Risk**: Trung bình. Phụ thuộc pi-tui exports `CancellableLoader`/`Loader` (tham khảo tui/index.ts).
**Verification**: `npm run test:unit -- --grep loaders`.
---
### Task #47 — Port `truncateToVisualLines` cho transcript
**Source**: `pi-mono/packages/coding-agent/src/modes/interactive/components/visual-truncate.ts`
**Đích**: `pi-crew/src/utils/visual.ts` (mở rộng từ Task #38)
**Lý do**: `transcript-viewer.ts` hiện dùng `wrap()` thủ công không tính ANSI codes → wrap sai khi line có color → tràn box hoặc hiển thị loang lổ. `truncateToVisualLines` của pi-mono dùng `Text.render(width)` từ pi-tui để tính chính xác visual lines.
**API export** (bổ sung vào visual.ts):
```typescript
export interface VisualTruncateResult { visualLines: string[]; skippedCount: number }
export function truncateToVisualLines(text: string, maxVisualLines: number, width: number, paddingX?: number): VisualTruncateResult;
```
**Tích hợp**:
- `DurableTextViewer.render` + `DurableTranscriptViewer.render`: thay `body.flatMap(wrap)` bằng `truncateToVisualLines`.
- Hiển thị `... (X lines truncated above)` khi `skippedCount > 0`.
**Acceptance**:
- Unit test:
- Line không vượt width → trả nguyên + skippedCount=0.
- Line vượt → wrap đúng số dòng + giữ ANSI codes nguyên vẹn.
- `maxVisualLines = 5` với 10 dòng → trả 5 dòng cuối + skippedCount = 5.
- Visual smoke: open transcript có code block ANSI dài → no overflow.
**Risk**: Thấp. Pure utility.
**Verification**: `npm run test:unit -- --grep visual-truncate`.
---
### Task #48 — Syntax highlight cho transcript JSONL events
**Source**: `pi-mono/packages/coding-agent/src/modes/interactive/theme/theme.ts` (`highlightCode`, `getLanguageFromPath`)
**Đích**: `pi-crew/src/ui/syntax-highlight.ts` (mới)
**Lý do**: `transcript-viewer.ts` in JSON tool args + assistant code blocks plain text. Highlight tăng readability:
- JSON keys → blue, strings → orange, numbers → green
- Code in messages: detect language → highlight.
**Dependency check**: `cli-highlight` đã có trong pi-mono. Verify pi-crew `package.json` — nếu chưa: `npm i cli-highlight`.
**API export**:
```typescript
export function highlightCode(code: string, lang: string | undefined, theme: CrewTheme): string[];
export function highlightJson(json: string, theme: CrewTheme): string;
export function detectLanguageFromPath(filePath: string): string | undefined;
```
**Implementation**:
- Copy `highlightCode` + `getLanguageFromPath` từ pi-mono.
- Thay `theme` reference qua adapter (Task #42).
- `highlightJson` shorthand cho `lang="json"`.
**Tích hợp**:
- `formatTranscriptEvent`: khi event là `[Tool: edit]` với JSON args → `highlightJson(stringify(args), theme)`.
- `[Assistant]` content có ```code``` block → extract lang + highlight.
**Acceptance**:
- Unit test:
- `highlightJson('{"a":1,"b":"x"}')` → lines có ANSI color codes.
- `highlightCode("function f(){}", "typescript")` → keyword màu.
- Invalid lang → fallback plain.
- Manual: `team-transcript` xem JSON tool args có màu.
**Risk**: Trung bình. `cli-highlight` ~100KB dep.
**Verification**: `npx tsc --noEmit` + `npm run test:unit -- --grep syntax-highlight`.
---
## Tier 4 — Polish (optional)
### Task #49 (optional) — Animated mascot easter egg `/team-mascot`
**Source**: `pi-mono/packages/coding-agent/src/modes/interactive/components/armin.ts`
**Đích**: `pi-crew/src/ui/mascot.ts` + slash command `/team-mascot`
**Lý do**: Branding/morale. Pi có Armin, pi-crew có thể có mascot riêng (vd: 1 nhóm 3 robots).
**Implementation**:
- XBM bitmap riêng (nhỏ ~30×30) hoặc reuse art logic từ armin.
- 7 effects: typewriter, scanline, rain, fade, crt, glitch, dissolve.
**Acceptance**:
- Slash command `/team-mascot` mở overlay 5s rồi auto-close.
- Không impact startup time (lazy load asset khi gọi).
**Risk**: Thấp. Optional/cosmetic.
**Verification**: Manual smoke.
---
## Tracking template (sao chép vào commit message)
```
Phase 4 #NN — <short title>
Source: source/pi-mono/packages/coding-agent/src/<file>.ts (or pi-tui/...)
Target: pi-crew/src/<dir>/<file>.ts
Risk: low | medium | high
Tests added: test/unit/<file>.test.ts
Verification: tsc --noEmit OK; test:unit OK; test:integration <OK|N/A>; bench <numbers>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
```
---
## Thứ tự gợi ý thực hiện
1. **Tuần 1 — Tier 1 (Performance)**: #38 → #40 → #39 → #41
- #38 dedupe trước (pre-req cho mọi refactor sau).
- #40 file-coalescer (low risk, immediate I/O save).
- #39 render cache (cần #38 để có visual.ts).
- #41 manifest cache (cần #31 fs-watch từ Phase 3).
- Bench trước/sau để chứng minh ≥ 4× improvement render hot path.
2. **Tuần 2 — Tier 2 (Theme)**: #42 → #43 → #44
- #42 type-safe adapter (pre-req cho mọi UI refactor).
- #43 status palette (low risk, mapping pure).
- #44 layout primitives (cần snapshot test trước refactor).
3. **Tuần 3 — Tier 3 (UI components)**: #45 → #46 → #47 → #48
- Có thể song song nếu nhiều dev. Ngược lại theo thứ tự diff → loader → visual-truncate → syntax-highlight.
- #45 + #48 cần thêm runtime dep (`diff`, `cli-highlight`) — review trước khi merge.
4. **Tier 4 (#49)**: nếu còn thời gian. Branding/morale, không ảnh hưởng functionality.
Toàn bộ Phase 4 ước tính 47 ngày focus work, thêm 2 runtime deps (`diff`, `cli-highlight`) khi triển khai #45 + #48 (verify chưa có trong package.json trước khi cài).
---
## Metrics mục tiêu (verification cuối Phase 4)
- **Render cost**: `LiveRunSidebar.render(80) × 1000` từ ~150ms → ≤ 50ms.
- **Disk I/O**: Tick 10Hz × 10s, đọc `agents.json` từ ~100 lần → ≤ 25 lần.
- **LOC**: 5 file UI giảm ≥ 25% (~400 dòng).
- **Test count**: 213 unit → ~245 unit (thêm ~32 test cho 12 task).
- **Type safety**: 0 `as unknown as { fg?: ... }` cast trong `src/ui/`.
- **Deps mới**: tối đa +2 (`diff`, `cli-highlight`), tổng size +130KB.

View File

@@ -0,0 +1,402 @@
# Phase 5 Refactor Plan — Footer/Selectlist/Hot-reload từ pi-mono coding-agent
> Xuất xứ: re-read `source/pi-mono/packages/coding-agent/src/modes/interactive/components/{footer,bordered-loader,dynamic-border,visual-truncate,diff,countdown-timer,extension-selector,theme-selector,custom-message,tool-execution,bash-execution}.ts` + `theme/theme.ts` (28/04/2026).
> Mục tiêu: vá lỗi subtle còn lại từ Phase 4, hot-reload theme, port footer/select-list pattern, chuẩn hóa border + tool state styling.
> Phase 4 đã hoàn tất, baseline: tsc 0 errors, 222 unit + 21 integration pass, commit `44fdd02`.
## Quy ước chung
- Không phá vỡ public API (slash commands, tool actions, config schema). Mọi thay đổi nội bộ.
- Sau mỗi task: `npx tsc --noEmit` + `npm run test:unit` (+ `test:integration` nếu liên quan render/runtime).
- Không thêm dependency runtime mới. Tất cả implement self-contained hoặc qua peer dep `@mariozechner/pi-tui` đã có.
- Mỗi task = 1 commit độc lập có thể revert. Đặt tên test bám sát hành vi.
- Ưu tiên backward compatibility: default behavior không đổi, opt-in qua config khi có hành vi mới.
## Trạng thái cập nhật
- [x] Task #50 — Fix `truncateToVisualLines` slice-after-merge bug
- [x] Task #51 — Memoize `visibleWidth` LRU cache
- [x] Task #52 — Theme hot-reload subscription
- [x] Task #53 — Theme adapter `inverse` ANSI fallback
- [x] Task #54`CrewFooter` component port
- [x] Task #55`CrewSelectList` adapter
- [x] Task #56`DynamicCrewBorder` reusable + CountdownTimer 1s tick
- [x] Task #57 — Tool state styling cho transcript-viewer
---
## Tier 1 — Bug fixes & correctness (low risk, immediate value)
Mục tiêu: 2 task, vá bug từ Phase 4 + tăng hiệu năng nhỏ. Ước tính: 0.5 ngày.
### Task #50 — Fix `truncateToVisualLines` slice-after-merge bug
**Source**: `pi-mono/coding-agent/components/visual-truncate.ts`
**Đích**: `pi-crew/src/utils/visual.ts`
**Lý do**: Phase 4 #47 implement `truncateToVisualLines` với logic:
```ts
const visualLines = text.split("\n").flatMap((line) =>
wrapHard(pad(line, ...).trimEnd(), effectiveWidth).slice(0, Math.max(1, maxVisualLines))
);
```
Bug: `slice(0, maxVisualLines)` áp dụng **per source line** thay vì **toàn bộ visual lines sau merge**. Nếu 1 source line wrap thành N visual lines (N > maxVisualLines), kết quả lấy đầu line đó, không phải tail của toàn bộ output. Khi nhiều source line, tổng visual có thể vượt maxVisualLines.
pi-mono dùng pattern đúng: render rồi `slice(-maxVisualLines)`.
**Logic chuẩn**:
```ts
export function truncateToVisualLines(text, maxVisualLines, width, paddingX = 0) {
if (!text) return { visualLines: [], skippedCount: 0 };
const effectiveWidth = Math.max(1, width - paddingX * 2);
const allVisual = text.split("\n").flatMap((line) =>
wrapHard(pad(line, effectiveWidth).trimEnd(), effectiveWidth)
);
if (allVisual.length <= maxVisualLines) return { visualLines: allVisual, skippedCount: 0 };
return { visualLines: allVisual.slice(-maxVisualLines), skippedCount: allVisual.length - maxVisualLines };
}
```
**Acceptance**:
- 1 source line wrap thành 5 visual lines, maxVisualLines=2 → trả về 2 visual lines cuối + skippedCount=3
- 3 source lines × 2 visual mỗi line = 6 visual, maxVisualLines=4 → trả về 4 cuối + skippedCount=2
- empty input → `{ visualLines: [], skippedCount: 0 }` (đổi từ `[""]` về `[]` để khớp pi-mono)
**Verification**: 2 unit test mới trong `test/unit/visual.test.ts`. Verify transcript-viewer integration vẫn pass test cũ.
**Risk**: thay đổi semantic empty input — kiểm tra all callers (transcript-viewer, run-dashboard) handle `[]` thay vì `[""]`.
---
### Task #51 — Memoize `visibleWidth` qua LRU cache
**Source**: pattern caching từ pi-tui `utils.ts`
**Đích**: `pi-crew/src/utils/visual.ts`
**Lý do**: `visibleWidth(value)` được gọi trong:
- `pad`, `truncateToWidth`, `wrapHard` (mỗi character iter)
- `crew-widget.ts colorWidgetLine` (mỗi line, mỗi tick 250ms)
- `RunDashboard.render` (5-10 lần per render)
- Total ước tính: 50+ calls/render × 4 render/sec = 200+ regex ops/sec.
Cache key = string identity, value = width. Reset khi cache > 256 entries (FIFO eviction).
**API**:
```ts
const widthCache = new Map<string, number>();
const CACHE_LIMIT = 256;
export function visibleWidth(value: string): number {
const cached = widthCache.get(value);
if (cached !== undefined) return cached;
let length = 0;
for (const char of value.replace(ANSI_PATTERN, "")) {
if (char !== "\n") length += 1;
}
if (widthCache.size >= CACHE_LIMIT) {
const firstKey = widthCache.keys().next().value;
if (firstKey !== undefined) widthCache.delete(firstKey);
}
widthCache.set(value, length);
return length;
}
```
**Acceptance**:
- `visibleWidth("foo")` gọi 1000 lần → chỉ tính 1 lần (kiểm qua spy với regex.exec count nếu có Diff bench).
- Cache không leak: limit 256, sau 1000 unique strings thì size = 256.
- Output identical với version không cache (regression test).
**Verification**:
- 1 unit test cache hit
- 1 unit test eviction (insert 257 strings, kiểm size === 256)
- Bench: `visibleWidth(longString) × 10000` → time giảm ≥ 5× (ms log).
**Risk**: cache miss khi string concat/template (mỗi lần object identity khác). Nhận diện qua bench thực tế.
---
## Tier 2 — Theme & style consistency
Mục tiêu: 2 task, hot-reload + inverse fallback. Ước tính: 0.5 ngày.
### Task #52 — Theme hot-reload subscription
**Source**: `pi-mono/coding-agent/theme/theme.ts` `onThemeChange()` + `startThemeWatcher()`
**Đích**: `pi-crew/src/ui/theme-adapter.ts`, `src/extension/register.ts`
**Lý do**: pi-mono có cơ chế watch custom theme JSON, debounce 100ms reload, emit callback. pi-crew adapter chỉ snapshot theme 1 lần ở `ctx.ui.custom((tui, theme, ...) => Component)`. Khi user gõ `/theme dark` từ pi-coding-agent, các pi-crew widget hold theme cũ cho tới khi recreate component.
**Approach**:
1. Add `subscribeThemeChange(theme: unknown, callback: () => void): () => void` trong theme-adapter.ts. Internally:
- Test if `theme` object có `addEventListener?.("change", ...)` hoặc `onThemeChange?.(...)` API.
- Fallback: poll `theme.getColorMode?.()` + key signature mỗi 1s, callback nếu thay đổi.
2. CrewWidgetComponent / LiveRunSidebar / RunDashboard / DurableTextViewer: gọi `subscribeThemeChange` trong constructor, store unsubscribe, gọi `this.invalidate()` khi callback fires.
3. dispose: unsubscribe.
**Acceptance**:
- Mock theme với `onThemeChange` API → callback fires trong 200ms.
- Mock theme polling → kiểm callback fires sau 1.1s khi sig thay đổi.
- Dispose component → no further callback.
**Verification**: 2 unit test mock theme objects. Manual test: chạy pi với `/theme light` rồi `/theme dark`, kiểm RunDashboard re-render.
**Risk**: polling 1s × N components → overhead. Mitigate: shared global subscription, fan-out tới components qua singleton subscriber list. Implement singleton trong theme-adapter.
---
### Task #53 — Theme adapter `inverse` ANSI fallback
**Source**: `pi-mono` dùng `chalk.inverse(text)` = `\x1b[7m{text}\x1b[27m`
**Đích**: `pi-crew/src/ui/theme-adapter.ts`
**Lý do**: `asCrewTheme` hiện chỉ pass-through nếu source theme có `inverse`, fallback identity (return text nguyên). render-diff dùng `theme.inverse?.(value) ?? value` → khi theme nguồn không có inverse, intra-line diff highlight bị mất hoàn toàn. Bug visual subtle, không có test catch.
**Logic chuẩn**:
```ts
function asInverse(value: unknown): (text: string) => string {
const fn = asUnaryFn(value);
if (fn) return fn;
return (text) => `\u001b[7m${text}\u001b[27m`;
}
```
**Acceptance**:
- `asCrewTheme(undefined).inverse?.("x")``"\u001b[7mx\u001b[27m"`.
- `asCrewTheme(realTheme).inverse?.("x")` → output từ chalk (test bằng `includes("\u001b[7m")`).
- renderDiff với theme tối giản vẫn highlight inverse lookup.
**Verification**: cập nhật `loaders.test.ts`/thêm `theme-adapter.test.ts` 2 test (default fallback + provided theme passthrough).
**Risk**: thấp — additive change.
---
## Tier 3 — UX components (port pattern từ pi-mono)
Mục tiêu: 3 task, footer + selectlist + dynamic border. Ước tính: 1 ngày.
### Task #54 — `CrewFooter` component port
**Source**: `pi-mono/coding-agent/components/footer.ts`
**Đích**: `pi-crew/src/ui/crew-footer.ts` (mới), tích hợp vào `RunDashboard`.
**Lý do**: pi-mono Footer là pattern multi-line trang trí (pwd+branch, tokens, context %, model). pi-crew RunDashboard có summary 1 line trộn rời rạc. Port để đồng bộ visual với coding-agent.
**Layout (3 lines)**:
```
~/proj (main) • runId • running (dim)
↑in ↓out R cache W cache $cost • 45.3%/200k (dim, % colored)
[badge1] [badge2] ... (extension statuses)
```
**API**:
```ts
export interface CrewFooterData {
pwd: string;
branch?: string;
runId?: string;
status?: RunStatus;
usage?: UsageState;
contextWindow?: number;
contextPercent?: number;
badges?: string[]; // raw text per extension status
}
export class CrewFooter {
constructor(private data: CrewFooterData, private theme: CrewTheme) {}
setData(data: CrewFooterData): void;
render(width: number): string[];
invalidate(): void;
}
```
**Color logic**:
- contextPercent > 90 → `theme.fg("error", ...)`
- > 70 → `theme.fg("warning", ...)`
- ≤ 70 → no color
**Acceptance**:
- Render cho run với usage tokens → output chứa `↑`, `↓`, `$cost`.
- Truncate khi width nhỏ → ellipsis `...`.
- contextPercent NaN/undefined → display `?/window`.
**Verification**:
- `test/unit/crew-footer.test.ts` 4 test (basic render, color thresholds, truncation, missing data).
- Integrate vào `RunDashboard.renderFooter` (thay phần legacy footer).
**Risk**: RunDashboard layout shift — kiểm snapshot lines count với existing tests.
---
### Task #55 — `CrewSelectList` adapter
**Source**: `@mariozechner/pi-tui` `SelectList` (peer dep) + pi-mono `extension-selector.ts`/`theme-selector.ts` patterns
**Đích**: `pi-crew/src/ui/crew-select-list.ts`
**Lý do**: RunDashboard handle keyboard navigation thủ công (j/k/enter), không có visual highlight selected, không support `onPreview`. pi-tui SelectList có sẵn nhưng pi-crew chưa wrap. Cần adapter để xài SelectList từ peer dep pi-tui (optional dep — kiểm `import { SelectList } from "@mariozechner/pi-tui"` available).
**Approach**:
1. Detect runtime: `try { require.resolve("@mariozechner/pi-tui"); }` → dùng pi-tui SelectList.
2. Fallback: simple list component port từ extension-selector.ts (j/k/↑/↓/enter/esc handlers, highlight ` → ` cho selected).
3. API:
```ts
export interface CrewSelectItem<T = string> {
value: T;
label: string;
description?: string;
}
export class CrewSelectList<T = string> {
constructor(
items: CrewSelectItem<T>[],
theme: CrewTheme,
options: {
onSelect: (item: CrewSelectItem<T>) => void;
onCancel: () => void;
onPreview?: (item: CrewSelectItem<T>) => void;
maxHeight?: number;
}
) {}
render(width: number): string[];
handleInput(data: string): void;
invalidate(): void;
setSelectedIndex(i: number): void;
getSelected(): CrewSelectItem<T> | undefined;
}
```
**Acceptance**:
- Render với 5 items → 5 lines, selected có ` → `.
- handleInput("j") → selected index +1, callback onPreview fired.
- handleInput("\n") → callback onSelect with current item.
- maxHeight=3 với 10 items → scroll, indicator `↑ N more`/`↓ N more`.
**Verification**: `test/unit/crew-select-list.test.ts` 5 test.
**Risk**: API mismatch nếu pi-tui SelectList API đổi version. Pin behavior qua adapter, fallback always available.
---
### Task #56 — `DynamicCrewBorder` reusable + CountdownTimer 1s tick
**Source**: `pi-mono/coding-agent/components/dynamic-border.ts` + `countdown-timer.ts`
**Đích**: `pi-crew/src/ui/dynamic-border.ts` (mới), refactor `loaders.ts`
**Lý do**:
1. **DynamicBorder**: 10 LOC, render single line `─×width`. pi-crew có 3 nơi tự vẽ border:
- `loaders.ts CrewBorderedLoader`: `┌─┐│└─┘` static template
- `mascot.ts`: tự build `╭─╮│╰─╯`
- `run-dashboard.ts/transcript-viewer.ts`: tự pad border lines
→ Refactor dùng chung `DynamicCrewBorder` cho horizontal lines, giữ corner chars riêng.
2. **CountdownTimer 1s tick**: hiện tại tick 250ms (4×/s). pi-mono tick chính xác 1000ms + `tui.requestRender()`. 4× tick là wasteful, gây re-render trùng lặp.
**API**:
```ts
// dynamic-border.ts
export interface DynamicCrewBorderOptions {
color?: (s: string) => string;
char?: string; // default "─"
}
export class DynamicCrewBorder {
constructor(theme: CrewTheme, options?: DynamicCrewBorderOptions) {}
render(width: number): string[];
invalidate(): void;
}
```
CountdownTimer change:
```ts
// trong loaders.ts CountdownTimer
- this.timer = setInterval(() => { ... }, 250);
+ this.timer = setInterval(() => {
+ const seconds = this.secondsLeft();
+ this.onTick(seconds);
+ if (seconds <= 0) this.emitExpire();
+ }, 1000);
```
**Acceptance**:
- DynamicCrewBorder.render(20) → `["─".repeat(20)]` (with color).
- DynamicCrewBorder dùng trong CrewBorderedLoader, mascot box, run-dashboard separators.
- CountdownTimer onTick called ~3 lần trong 3.5s (giây 3, 2, 1, 0 không nhiều hơn).
**Verification**:
- 2 unit test cho DynamicCrewBorder (basic render, custom char).
- Update `loaders.test.ts` CountdownTimer test: kiểm onTick count = ceil(timeoutMs/1000) + 1.
**Risk**: mascot CountdownTimer (nếu có) cần điều chỉnh cùng. Visual flicker giảm bằng tick 1s thay 250ms.
---
## Tier 4 — Power features
Mục tiêu: 1 task, tool state styling. Ước tính: 0.25 ngày.
### Task #57 — Tool state styling cho transcript-viewer
**Source**: `pi-mono/coding-agent/components/tool-execution.ts` (toolPendingBg/toolSuccessBg/toolErrorBg state)
**Đích**: `pi-crew/src/ui/transcript-viewer.ts`
**Lý do**: transcript-viewer hiện render `[Tool: name] type` plain text. Không phân biệt:
- partial vs final result
- success vs error (`result.isError`)
- queued vs running
User scan transcript khó tìm ra error tool nhanh.
**Logic update `formatTranscriptEvent`**:
```ts
const isError = obj.isError === true || asRecord(obj.result)?.isError === true;
const isPartial = obj.isPartial === true;
const status: RunStatus = isError ? "failed" : isPartial ? "running" : "completed";
const icon = iconForStatus(status, { runningGlyph: "⋯" });
const headerColor = colorForStatus(status);
const header = theme.fg(headerColor, `${icon} [Tool${toolName ? `: ${toolName}` : ""}] ${type}`);
```
**Acceptance**:
- Event với `isError: true` → header có icon `✗`, color `error`.
- Event với `isPartial: true` → header có icon `⋯`/`▶`, color `accent`.
- Event normal → icon `✓`, color `success`.
- Existing tests `formatTranscriptText formats message and tool JSONL into conversation lines` vẫn pass.
**Verification**: thêm 2 test cho transcript-viewer (error tool, partial tool).
**Risk**: thấp — schema event đã có `isError`, chỉ unwrap đúng.
---
## Thứ tự gợi ý thực hiện
1. **Day 1 — Tier 1 (bug fix + perf)**: #50#51
- #50 fix bug subtle có thể impact nhiều screen.
- #51 cache độc lập, không phụ thuộc #50.
2. **Day 1.5 — Tier 2 (theme)**: #52#53
- #53 nhanh (additive). #52 cần test với mock theme objects.
3. **Day 2 — Tier 3 (UX)**: #54#55#56
- #54 footer độc lập, không break.
- #55 select-list pre-req cho future RunDashboard refactor.
- #56 dynamic-border refactor 3 file (loaders, mascot, dashboard).
4. **Day 2 close — Tier 4 (#57)**: tool state styling, kết hợp với existing iconForStatus.
Toàn bộ Phase 5 ước tính 1.52 ngày focus work, **0 dependency mới**.
---
## Metrics mục tiêu (verification cuối Phase 5)
- **truncateToVisualLines correctness**: 0 known bug. New tests catch slice-after-merge.
- **visibleWidth perf**: cache hit rate ≥ 80% trong tick loop, regex calls giảm ≥ 5× theo bench.
- **Theme reload latency**: < 200ms từ `onThemeChange` callback tới UI re-render.
- **Footer info density**: RunDashboard footer 2-3 line giống pi-coding-agent.
- **Border consistency**: 1 DynamicCrewBorder thay 3 self-rolled patterns.
- **Test count**: 222 unit → ~234 unit (thêm ~12 test cho 8 task).
- **Type safety**: 0 unsafe theme cast (giữ nguyên Phase 4).
- **Deps mới**: 0.
---
## Tracking template (per commit message)
```
Phase 5 task #<num>: <title>
<body — what changed, why, refs to source pi-mono>
Verification: tsc --noEmit OK; test:unit OK; test:integration <OK|N/A>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
```

View File

@@ -0,0 +1,662 @@
# Phase 6 Refactor Plan — Robustness sau test 0.1.27/0.1.29 + nợ kỹ thuật từ source-runtime-refactor-map
> Xuất xứ:
> - Test thực tế run `team_20260428152644_2ae0dce7` (parallel-research, 10/10 completed) trên pi-crew@0.1.27.
> - Re-read source 28/04/2026 sau bump 0.1.28 (responseTimeoutMs 15s→5m) và 0.1.29 (republish).
> - Findings còn lại từ `docs/source-runtime-refactor-map.md` (subagent runtime consolidation, model-routing persistence, adaptive planner repair).
>
> Phase 5 đã hoàn tất (UI/footer/select-list/theme hot-reload). Phase 6 tập trung **runtime hardening + maintainability**, không phá public API.
## Quy ước chung (giữ nguyên từ Phase 5)
- Không phá vỡ public API: tool actions, slash commands, config schema, schema.json.
- Sau mỗi task: `npx tsc --noEmit` + `npm run test:unit` (`test:integration` khi đụng runtime/spawn/state).
- Không thêm runtime dependency mới ngoài stdlib + peer deps đã có (`pi-coding-agent`, `pi-ai`, `pi-agent-core`, `pi-tui`, `jiti`).
- Mỗi task = 1 commit độc lập, có thể revert riêng. Test name bám sát hành vi (`describe`/`it` đặt theo contract chứ không theo file).
- Default behavior không đổi (backward-compat); cải tiến hành vi đi qua opt-in env/config khi có nguy cơ regression.
- Mỗi task có Acceptance + Verification + Risk/Rollback. Trước khi mở PR phải `npm run ci` (typecheck + test:unit + test:integration + npm pack --dry-run).
## Roadmap tổng quan
| Tier | Workstream | Số task | Ước tính | Ưu tiên |
|---|---|---|---|---|
| **1** | Background runner & async robustness | T60T62 | 0.5 ngày | P0 — chặn rủi ro silent fail |
| **1** | Concurrency hard cap | T63 | 0.25 ngày | P0 — chặn user override DoS |
| **2** | Resume durability cho synthesize/write | T64T66 | 1 ngày | P1 — nâng cao reliability |
| **2** | Adaptive planner repair/retry | T67 | 0.5 ngày | P1 — giảm block rate |
| **2** | Model routing persistence | T68T69 | 0.5 ngày | P1 — observability |
| **3** | register.ts modularization | T70T72 | 1 ngày | P2 — maintainability |
| **3** | Subagent runtime consolidation | T73T75 | 1.5 ngày | P2 — debt theo refactor map |
| **3** | Skills builtin + docs self-contained | T76T78 | 0.5 ngày | P3 — polish |
| **4** | Tests, smoke, CHANGELOG | T79T81 | 0.5 ngày | P0 (cuối phase) |
Tổng: **22 task / ~6.25 ngày**, có thể ship theo nhiều mini-release (0.1.30, 0.1.31, …).
## Tiến độ triển khai
| Task | Trạng thái | Commit / ghi chú |
|---|---|---|
| T60 | ✅ Done | `bfd9bc8` — jiti loader resolution/fail-fast |
| T61 | ✅ Done | `bfd9bc8` — async early-exit guard |
| T62 | ✅ Done | `bfd9bc8` — async startup marker |
| T63 | ✅ Done | `bfd9bc8` — concurrency hard cap + opt-out |
| T64 | ✅ Done | checkpoint phases + child-stdout-final/artifact-written resume recovery |
| T65 | ✅ Done | async notifier marks quiet dead background runners failed with `async.died` |
| T66 | ✅ Done | `5e495dc` — replay pending mailbox on resume |
| T67 | ✅ Done | adaptive plan repair for malformed JSON, oversized plans, and role aliases |
| T68 | ✅ Done | `1f92b8a` — persisted model routing metadata |
| T69 | ✅ Done | `1f92b8a` — agent records carry routing metadata |
| T70 | ✅ Done | `register.ts` split to ≤200 lines with commands, team tool, subagent tools, artifact cleanup modules |
| T71 | ✅ Done | `team-tool.ts` split to ≤300 lines with status/inspect/lifecycle/cancel/plan modules |
| T72 | ✅ Done | `task-runner.ts` split to ≤300 lines with prompt/progress/state/live/result helper modules |
| T73 | ✅ Done | `src/subagents/*` entrypoints added and runtime call-sites migrated |
| T74 | ✅ Done | live-session APIs routed through `src/subagents/live/*` with dynamic task-runner import |
| T75 | ✅ Done | `1004589` + explicit subagent depth/role spawn tests |
| T76 | ✅ Done | `f6ece8e` — built-in coding skills |
| T77 | ✅ Done | `9e54acd` — self-contained architecture docs |
| T78 | ✅ Done | `9e54acd` — runtime flow docs |
| T79 | ✅ Done | multi-shard, no-wrapper spawn, and async restart recovery smokes covered |
| T80 | ✅ Done | package snapshot guards docs/skills/jiti/pi manifest packaging |
| T81 | ✅ Done | changelog release prep notes added; no publish/version bump performed |
---
## Tier 1 — Robustness chặn rủi ro silent fail (P0)
### Task #60 — `background-runner.ts` fail-fast nếu jiti loader không tồn tại
**Lý do (evidence)**: `src/runtime/background-runner.ts` `getBackgroundRunnerCommand()` xây cứng đường dẫn:
```ts
const jitiRegisterPath = path.join(packageRoot, "node_modules", "jiti", "lib", "jiti-register.mjs");
return { args: ["--import", pathToFileURL(jitiRegisterPath).href, runnerPath, ...], loader: "jiti" };
```
Nếu user xóa `node_modules/jiti` (npm prune, monorepo hoisting bất thường, broken install), `spawn(process.execPath, ...)` không fail ở Node parent — child sẽ exit lỗi ngay nhưng parent không capture được vì stdout đã `child.unref()` + đóng `logFd`. Background log chỉ chứa `[pi-crew] background loader=jiti` rồi im lặng. Run sẽ kẹt ở status `running` cho đến khi `process-status.hasStaleAsyncProcess` mark stale (>10 phút).
**Đích**: `src/runtime/background-runner.ts`
**Steps**:
1. Trước khi `spawn`, kiểm tra `fs.existsSync(jitiRegisterPath)`. Nếu thiếu → throw `Error` với message rõ ràng:
```
pi-crew background runner cannot start: jiti loader not found at
<jitiRegisterPath>. Reinstall pi-crew (`pi install npm:pi-crew`) or
ensure node_modules/jiti is present.
```
2. Caller (`team-tool/run.ts` qua `spawnBackgroundTeamRun`) đã có try/catch — đảm bảo error propagate ra notify cho user.
3. Append error vào `events.jsonl` qua `appendEvent(eventsPath, { type: "async.failed", message })` trước khi throw.
4. Mở rộng: thêm fallback path tìm jiti trong `require.resolve.paths()` của parent module (Windows monorepo hoist) — nếu primary path missing thì thử `path.join(packageRoot, "..", "..", "node_modules", "jiti", "lib", "jiti-register.mjs")` (npm hoisting 2 cấp). Nếu cả hai miss thì mới throw.
**Acceptance**:
- Khi `node_modules/jiti/lib/jiti-register.mjs` thiếu → `spawnBackgroundTeamRun` throw với message hướng dẫn reinstall.
- Khi user dùng monorepo hoisting (jiti ở root workspace) → vẫn resolve được.
- `events.jsonl` có entry `async.failed` trước khi spawn.
- Không regression với case có jiti (path 1 hit).
**Tests**: `test/unit/background-runner.fail-fast.test.ts`
- Stub `fs.existsSync` để giả lập miss → assert throw với pattern `/jiti loader not found/`.
- Stub hoist path tồn tại → assert dùng path thay thế.
- Cleanup không leak global state (`vi`-style spy + restore).
**Verification**:
```bash
npx tsc --noEmit
node --experimental-strip-types --test test/unit/background-runner.fail-fast.test.ts
```
**Risk/Rollback**: Risk thấp — chỉ thêm sanity check trước spawn. Rollback bằng cách revert commit.
**Security/Perf notes**: Không I/O bổ sung trong hot path (chỉ 1 stat khi spawn background). Không log đường dẫn đầy đủ ở mức user message để tránh lộ home directory; dùng `shortenPath()` từ `utils/visual.ts` nếu có.
---
### Task #61 — Capture early-exit của background runner (drain `background.log`)
**Lý do**: Hiện sau `child.unref(); fs.closeSync(logFd);` parent quên child. Nếu background-runner.ts lỗi cú pháp/import (không phải jiti missing nhưng vẫn fail), log chỉ chứa stderr Node. Status tool báo `Async: pid=X alive=false` sau khi process exit, nhưng manifest status vẫn `running`. User phải đợi `hasStaleAsyncProcess` (10 phút) mới detect.
**Đích**: `src/extension/team-tool/run.ts` (caller) và `src/runtime/process-status.ts`
**Steps**:
1. Trong caller, lưu `pid` ngay sau spawn. Schedule một check sau ~3s (`setTimeout` + `unref`) gọi `checkProcessLiveness(pid)`:
- Nếu `alive=false` AND manifest vẫn `running` AND chưa có event `async.started` → đọc `background.log` (last 4KB), append event `async.failed` với log tail và `updateRunStatus(manifest, "failed", "Background runner exited within 3s; see background.log")`.
2. Cancel `setTimeout` nếu trong khoảng đó status đã chuyển khác `running`.
3. Đảm bảo không double-write status nếu background process đã write `async.failed` từ catch block.
**Acceptance**:
- Background runner exit ngay → run status chuyển `failed` trong ≤4s với reason có tail log.
- Background runner chạy bình thường → không có false positive.
**Tests**: `test/integration/background-early-exit.test.ts`
- Mock `spawnBackgroundTeamRun` với child exit ngay (set `PI_TEAMS_MOCK_CHILD_PI=fail-immediate` + extend mock branch).
**Verification**: `npm run test:integration -- background-early-exit`
**Risk/Rollback**: Cần test kỹ với case async hợp lệ; rollback bằng feature flag `PI_CREW_ASYNC_EARLY_EXIT_GUARD=0`.
---
### Task #62 — `async.started` event timeout & marker file
**Lý do**: Bổ sung `T61`. Background runner ghi `async.started` vào `events.jsonl` ở dòng đầu `main()`. Nếu file `events.jsonl` bị lock (Windows), event không append được. Caller hiện không có cơ chế chờ confirm.
**Đích**: `src/runtime/async-runner.ts` + `src/runtime/background-runner.ts`
**Steps**:
1. Background runner ghi marker file `state/runs/{runId}/async.pid` chứa `{pid, startedAt}` ngay sau khi `appendEvent("async.started")` thành công.
2. Caller (T61) khi healthcheck 3s đọc thêm marker file: nếu marker tồn tại → coi như runner đã start ổn.
3. Bổ sung `process-status.hasAsyncStartMarker(runId)`.
**Acceptance**: Marker tồn tại sau khi async runner startup; healthcheck dùng marker khi events.jsonl không khả dụng (Windows lock fallback).
**Tests**: unit cho `hasAsyncStartMarker` (file exists/missing/parse error).
**Verification**: `npm run test:unit`
---
### Task #63 — Hard cap cho `limits.maxConcurrentWorkers`
**Lý do**: `src/runtime/concurrency.ts.resolveBatchConcurrency()` dùng `limits.maxConcurrentWorkers` user truyền **không cap**. User config `limits.maxConcurrentWorkers=64` → 64 child Pi process spawn song song → DoS local. `parallel-utils.MAX_PARALLEL_CONCURRENCY=4` chỉ áp ở subagent runner cấp thấp, không bảo vệ scheduler.
**Đích**: `src/runtime/concurrency.ts`, `src/config/defaults.ts`, `src/config/config.ts`
**Steps**:
1. Thêm `DEFAULT_CONCURRENCY.hardCap = 8` vào `defaults.ts`.
2. Trong `resolveBatchConcurrency`, sau `requested = limitMax ?? teamMax ?? workflowMax ?? defaultWorkflowConcurrency`:
```ts
const cap = positiveInteger(input.hardCap) ?? DEFAULT_CONCURRENCY.hardCap;
const effective = Math.min(requested, cap);
```
3. Khi `effective < requested`, ghi `reason` thêm `;capped:${cap}` để observability.
4. Cho phép user opt-out qua `config.limits.allowUnboundedConcurrency=true` (gated qua warning event `limits.unbounded` + log dòng đầu run, default false).
5. Cập nhật `schema.json` + `config-schema.ts` cho field mới.
**Acceptance**:
- `limits.maxConcurrentWorkers=64` (default) → effective=8, reason chứa `capped:8`.
- `limits.maxConcurrentWorkers=64, allowUnboundedConcurrency=true` → effective=64, có event warning.
- Không regression cho values hợp lý (≤8).
**Tests**: `test/unit/concurrency.cap.test.ts`
- 4 case: requested=2 (no cap), requested=12 (cap=8), unbounded flag (no cap), workflow=parallel-research workflowMax=4 (no cap).
**Verification**: `npx tsc --noEmit && node --experimental-strip-types --test test/unit/concurrency.cap.test.ts`
**Risk/Rollback**: Có thể vô tình giảm throughput cho user power-user. Mitigate bằng `allowUnboundedConcurrency` flag. Rollback: revert + bump major nếu user đã dựa vào behavior cũ (chưa rõ).
**Security/Perf notes**: Bảo vệ memory/cpu local; mỗi child Pi consume ~200MB RAM. 8 = 1.6GB worst case, hợp lý cho dev machine.
---
## Tier 2 — Reliability nâng cao (P1)
### Task #64 — Resume detection: synthesize/write checkpoint
**Lý do**: `team-runner.executeTeamRun` không biết task synthesize/write đã completed một phần khi crash giữa chừng. Khi resume (`team resume runId`), task `synthesize` re-run từ đầu, gọi LLM lại tốn cost. Risk #5 trong test report.
**Đích**: `src/runtime/task-runner.ts`, `src/state/state-store.ts`, `src/state/types.ts`
**Steps**:
1. Mở rộng `TeamTaskState` thêm `checkpoint?: { phase: "started" | "child-spawned" | "child-stdout-final" | "artifact-written"; updatedAt: string; childPid?: number }`.
2. `runTeamTask` ghi checkpoint qua `saveRunTasks` ở 4 điểm:
- Trước `runChildPi` (`started`)
- Sau `child.pid` có (`child-spawned` + pid)
- Khi nhận `isFinalAssistantEvent` (`child-stdout-final`)
- Sau `writeArtifact` (`artifact-written`)
3. `team-tool.handleResume` xét checkpoint:
- Nếu `checkpoint.phase === "artifact-written"` mà status vẫn `running` → mark `completed` (recovery, không re-run).
- Nếu `checkpoint.phase === "child-stdout-final"` → cố parse output từ `transcripts/{taskId}.jsonl` last lines, nếu có valid `message_end` thì mark `completed` mà không re-spawn.
- Else → re-queue.
**Acceptance**:
- Crash sau khi artifact ghi xong → resume mark `completed` không re-run LLM.
- Crash giữa stdout streaming → resume cố recover từ transcript; nếu không thành công thì re-run.
- State migration backward-compat (task cũ không có `checkpoint` → resume hoạt động như cũ).
**Tests**: `test/integration/resume-checkpoint.test.ts`
- 3 case: pre-spawn crash, mid-stream crash, post-artifact crash.
**Verification**: `npm run test:integration -- resume-checkpoint`
**Risk/Rollback**: Touch durable state shape. Cần migration: nếu task không có `checkpoint`, treat như chưa start. Rollback: revert + xóa field optional khỏi types.
---
### Task #65 — Resume cho async background run sau parent crash
**Lý do**: Khi parent Pi session crash, background runner vẫn chạy; manifest cập nhật bình thường. Nhưng nếu **background runner crash** (ví dụ jiti corrupted, OOM), không có ai mark run failed cho đến `hasStaleAsyncProcess` 10 phút sau. Status sẽ misleading.
**Đích**: `src/runtime/process-status.ts`, `src/extension/async-notifier.ts`
**Steps**:
1. Mở rộng `async-notifier.ts.startAsyncRunNotifier`: với mỗi run đang `running`, mỗi `notifierIntervalMs` (5s) check `checkProcessLiveness(async.pid)`. Nếu `alive=false` VÀ run status `running` AND không có event nào trong 30s gần nhất → `updateRunStatus(manifest, "failed", "Background runner died unexpectedly; check background.log")`.
2. Bổ sung guard: chỉ thực hiện nếu chưa có event `async.completed`/`async.failed` (avoid double-write).
**Acceptance**: Background runner kill -9 → trong ≤30s status chuyển `failed`, có event `async.died`.
**Tests**: `test/integration/async-died.test.ts` (mock spawn process exit ngẫu nhiên).
**Verification**: `npm run test:integration -- async-died`
**Risk/Rollback**: False positive khi event log chậm flush. Mitigate: chỉ trigger khi không alive AND last event > 30s. Rollback: revert async-notifier hook.
---
### Task #66 — Mailbox replay khi resume
**Lý do**: `state/mailbox` có inbox/outbox JSONL nhưng resume không re-deliver pending messages. Risk #5 mở rộng.
**Đích**: `src/state/mailbox.ts`, `src/extension/team-tool/api.ts`
**Steps**:
1. Khi resume, đọc `mailbox/delivery.json`. Mọi message `direction=inbox` chưa `acked=true` → re-emit trong batch đầu.
2. Add `validate-mailbox repair=true` vào doctor checks để cleanup stale messages > 7 ngày.
**Acceptance**: Resume sau crash giữa khi mailbox có 3 message pending → 3 message được redelivered.
**Tests**: `test/unit/mailbox-replay.test.ts`
**Verification**: `npm run test:unit`
---
### Task #67 — Adaptive planner repair/retry trước khi block
**Lý do**: `team-runner.injectAdaptivePlanIfReady` block ngay khi `__test__parseAdaptivePlan` fail (oversize >12 task / JSON malformed / role không hợp lệ). User phải re-run từ đầu. Refactor map đã ghi nhận: "Add adaptive planner repair/retry for invalid JSON instead of immediate block when safe."
**Đích**: `src/runtime/team-runner.ts`, `agents/planner.md`
**Steps**:
1. Khi parse fail, thay vì return `missingPlan: true` ngay, thử **repair**:
- Nếu JSON malformed → spawn 1 child Pi tiny (planner role, model rẻ — Haiku/gpt-5-nano) với prompt: `Fix the following JSON to comply with the adaptive plan schema. Return only ADAPTIVE_PLAN_JSON_START ... ADAPTIVE_PLAN_JSON_END.\n<failed_text>`. Cap retry = 1, timeout 60s.
- Nếu oversize (>12 task) → tự trim phases tail tới ≤12 task, ghi event `adaptive.plan_trimmed`.
- Nếu role không hợp lệ → map sang role gần nhất (`reviewer`→`code-reviewer` nếu team có) hoặc skip task đó nếu phase không trống.
2. Nếu repair fail → mới block (giữ behavior hiện tại). Ghi event `adaptive.plan_repair_failed`.
3. Persist repair attempt vào `metadata/adaptive-repair.json` để debug.
**Acceptance**:
- Plan JSON malformed nhỏ (thiếu `}`) → repair fix → run tiếp.
- Plan 15 task → trim còn 12, run tiếp với warning.
- Plan với role lạ → map hoặc skip task; nếu không cứu được thì block với explain rõ ràng.
**Tests**: `test/unit/adaptive-repair.test.ts` (3 fixture: malformed, oversize, invalid-role).
**Verification**: `npm run test:unit -- adaptive-repair`
**Risk/Rollback**: Có thể ăn thêm 1 model call. Mitigate: chỉ retry khi cost < 0.001 USD ước tính (Haiku tier). Rollback: env `PI_CREW_ADAPTIVE_REPAIR=0`.
---
### Task #68 — Persist model routing (requested → selected → fallback chain → reason)
**Lý do**: Refactor map: "Move model routing transparency into persisted task/subagent records: requested model, selected model, fallback chain, fallback reason." Hiện task state chỉ có `modelAttempts: ModelAttemptSummary[]` (model + success + error) nhưng không persist `requestedModel` ban đầu user/agent yêu cầu, cũng như reason vì sao chuyển fallback.
**Đích**: `src/runtime/model-fallback.ts`, `src/state/types.ts`, `src/runtime/task-runner.ts`
**Steps**:
1. Mở rộng `TeamTaskState.modelRouting?: { requested?: string; resolved: string; fallbackChain: string[]; reason?: string; usedAttempt: number }`.
2. `buildConfiguredModelCandidates` trả thêm `requestedModel` (model agent.md / step.model trước fallback).
3. `runTeamTask` write `modelRouting` cùng `modelAttempts`.
4. `team-tool.handleStatus` render section `Model routing:` nếu có. Dashboard agent rows hiển thị `model · ≥requested:claude-sonnet-4-5 → openai-codex/gpt-5.5 (rate-limit)`.
**Acceptance**:
- Task chạy thành công lần 1 → `usedAttempt=0`, `fallbackChain` chứa chain config (không cần markFallback).
- Task fallback từ A → B vì rate-limit → `reason: "rate-limit"`, `usedAttempt=1`.
- Status output có dòng `Model routing` cho mỗi task có routing data.
**Tests**: `test/unit/model-routing.test.ts`
**Verification**: `npm run test:unit`
**Risk/Rollback**: Task state shape mở rộng — backward-compat (field optional). Rollback: revert types + hide UI.
---
### Task #69 — Subagent records lưu model routing
**Lý do**: Liên quan T68 nhưng cho `crew-agent-records` (file-backed agent status hiển thị ở dashboard). Hiện chỉ có `model` field (latest selected); cần `requestedModel` + `fallbackChain`.
**Đích**: `src/runtime/crew-agent-records.ts`
**Steps**:
1. Mở rộng `CrewAgentRecord` thêm `routing?: TeamTaskState["modelRouting"]`.
2. `recordFromTask` map từ `task.modelRouting`.
3. `live-run-sidebar` render `routing` ở chỗ model row.
**Tests**: snapshot trong `test/unit/crew-agent-records.test.ts`.
**Verification**: `npm run test:unit`
---
## Tier 3 — Maintainability & debt cleanup (P2)
### Task #70 — Tách `register.ts` thành sub-modules theo lifecycle
**Lý do**: `src/extension/register.ts` ~38KB trộn: lifecycle, RPC, manifest cache, foreground controller, sidebar, widget, mascot, command parsing, subagent manager, viewers. Quy tắc AGENTS.md "Keep `index.ts` minimal; register functionality from `src/extension/register.ts`. Prefer small modules over large orchestrator files." Đã có sub-folders `registration/` + `team-tool/` nhưng register.ts vẫn lớn.
**Đích**: `src/extension/register.ts` → split
**Steps**:
1. Tách thành 5 module:
- `src/extension/registration/lifecycle.ts` — session_start/session_before_switch/session_shutdown handlers + cleanupRuntime.
- `src/extension/registration/widget-loop.ts` — widget interval, sidebar lifecycle (`openLiveSidebar`, `liveSidebarTimer`).
- `src/extension/registration/foreground-runner.ts` — `startForegroundRun` + `foregroundControllers`.
- `src/extension/registration/subagent-tools.ts` — Agent/get_subagent_result/steer_subagent + crew_* aliases.
- `src/extension/registration/commands.ts` — đăng ký toàn bộ slash command (`/teams`, `/team-run`, …).
2. `register.ts` còn lại chỉ là wiring (≤200 dòng): tạo state, gọi các module.
3. Giữ public API (export `registerPiTeams`, `__test__subagentSpawnParams`).
**Acceptance**:
- `register.ts` ≤200 dòng.
- Mỗi module mới ≤300 dòng.
- Tests cũ pass không thay đổi.
- Thêm test snapshot cho commands list (đảm bảo không drop command nào).
**Tests**: `test/unit/registration.commands-coverage.test.ts` (assert 25 commands đăng ký).
**Verification**: `npx tsc --noEmit && npm run test`
**Risk/Rollback**: Refactor lớn — risk regression. Mitigate: tách từng commit nhỏ (1 module / commit). Rollback: revert lần lượt.
---
### Task #71 — Tách `team-tool.ts` actions còn lại
**Lý do**: `src/extension/team-tool.ts` ~32KB. Đã có `team-tool/{api,run,doctor}.ts`. Còn `handleStatus`, `handleEvents`, `handleArtifacts`, `handleWorktrees`, `handleResume`, `handleCancel`, `handleSummary`, `handleCleanup`, `handleForget`, `handlePrune`, `handleExport`, `handleImport`, `handleImports` ở file chính.
**Đích**: `src/extension/team-tool.ts` → split
**Steps**:
1. Tạo `src/extension/team-tool/{status,events,artifacts,resume,lifecycle-actions}.ts`.
2. `team-tool.ts` chỉ giữ router (`handleTeamTool`) + `handleList`/`handleGet` (đã ngắn).
**Acceptance**: `team-tool.ts` ≤300 dòng. Mỗi sub-module ≤300 dòng.
**Tests**: existing pass.
**Verification**: `npm run test`
---
### Task #72 — Tách `task-runner.ts`
**Lý do**: `src/runtime/task-runner.ts` ~28KB chứa: prompt building, child-pi orchestration, artifact writing, verification evidence, transcripts, retry logic, mailbox bridge.
**Đích**: split thành:
- `task-runner/prompt-builder.ts` (renderTaskPrompt + readOnlyRoleInstructions + coordinationBridgeInstructions).
- `task-runner/artifact-writer.ts` (writeTaskInputs/Outputs/Transcripts/Diff).
- `task-runner/retry.ts` (model fallback retry loop).
- `task-runner/index.ts` exports `runTeamTask`.
**Acceptance**: Mỗi module ≤300 dòng. Public function signature không đổi.
**Tests**: existing pass + snapshot prompt cho mỗi role (4 role).
**Verification**: `npm run test:integration -- task-runner`
---
### Task #73 — Consolidate `child-pi` + `async-runner` + `subagent-manager` thành `src/subagents/`
**Lý do**: Refactor map (đã ghi nhận từ Phase 0): "Consolidate subagent runtime into `src/subagents/*` or equivalent durable-first module." Hiện 3 file rải rác:
- `src/runtime/child-pi.ts` (435 dòng) — spawn pi CLI con
- `src/runtime/async-runner.ts` (~50 dòng) — entrypoint background
- `src/runtime/subagent-manager.ts` (~290 dòng) — Agent tool backend
**Đích**: tạo folder `src/subagents/` chứa:
- `src/subagents/spawn.ts` (lift từ child-pi.ts)
- `src/subagents/observer.ts` (ChildPiLineObserver + compactor)
- `src/subagents/manager.ts` (lift từ subagent-manager.ts)
- `src/subagents/async-entry.ts` (lift từ async-runner.ts)
- `src/subagents/index.ts` re-export public API
Để các file `runtime/child-pi.ts` thành thin re-export (deprecated path) cho 12 release rồi xóa.
**Acceptance**:
- Import paths cũ vẫn hoạt động (re-export shim).
- Không thay đổi logic; chỉ move + group.
- Tests cũ pass.
**Tests**: existing.
**Verification**: `npm run ci`
**Risk/Rollback**: Nhiều file đổi import. Mitigate: làm bằng IDE rename/move chứ không edit thủ công. Rollback: revert.
---
### Task #74 — Tách live-session runtime khỏi child-process
**Lý do**: `src/runtime/live-session-runtime.ts` (~14KB) gating sau cờ experimental, nhưng vẫn import từ `task-runner` chính. Nếu mai có người bật `PI_CREW_ENABLE_EXPERIMENTAL_LIVE_SESSION`, code path xen lẫn dễ break.
**Đích**: di chuyển `live-session-runtime.ts` + `live-agent-control/manager` + `live-agent-control-realtime.ts` vào `src/subagents/live/` (subdirectory mới của T73).
**Acceptance**: `runtime/runtime-resolver.ts` chỉ phụ thuộc qua `subagents/live`. Default flow (child-process) không import live module.
**Tests**: existing.
---
### Task #75 — Subagent depth/permission hardening
**Lý do**: `pi-args.checkCrewDepth` đã check `PI_CREW_DEPTH` env. Cần test thêm: subagent gọi recursive (Agent tool trong agent) > maxDepth → block + clear message.
**Đích**: `src/subagents/manager.ts`, `src/runtime/pi-args.ts`
**Steps**:
1. Add explicit test cho recursive spawn.
2. Bổ sung `role-permission.ts` để chặn agent có role `read_only` không được gọi tool `Agent`/`crew_agent`.
**Tests**: `test/unit/subagent-depth.test.ts`, `test/unit/role-permission.spawn.test.ts`.
**Verification**: `npm run test:unit`
---
## Tier 3 — Polish (P3)
### Task #76 — Skills builtin: extract từ `Source/awesome-agent-skills` + adapt
**Lý do**: `pi.skills` trong package.json khai báo `./skills` nhưng folder chỉ có `.gitkeep`. Có thể adapt 510 skill cốt lõi từ `Source/awesome-agent-skills/README.md`, `Source/oh-my-claudecode/skills/`, `Source/superpowers/`.
**Đích**: `skills/`
**Steps**:
1. Chọn 5 skill phù hợp coding:
- `safe-bash` (gate dangerous commands)
- `verify-evidence` (final assistant must include changed files + verification)
- `git-master` (commit hygiene + Conventional Commits)
- `read-only-explorer` (forbid edits when role is explorer/analyst)
- `task-packet` (enforce scope/inputs/outputs section)
2. Mỗi skill là file `.md` trong `skills/{name}/SKILL.md` + optional helper scripts.
3. Adapt mà không copy nguyên văn (giữ MIT compliance + ghi nguồn trong NOTICE.md).
4. Reference từ `agents/*.md` qua `skills: safe-bash, verify-evidence` frontmatter.
**Acceptance**:
- 5 skill files ≤500 dòng mỗi file.
- NOTICE.md cập nhật source attribution.
- Test discovery: `discover-skills.ts` (có chưa? — bổ sung nếu chưa có) trả về 5.
**Tests**: `test/unit/skills.discovery.test.ts`.
**Verification**: `npm run test:unit -- skills.discovery`
**Risk/Rollback**: Có thể inflate package size. Mitigate: skills nhỏ ≤4KB mỗi cái.
---
### Task #77 — `docs/architecture.md` self-contained
**Lý do**: `pi-teams/docs/architecture.md` hiện trỏ ra `../docs/pi-crew-source-review-and-lessons.md`, `../docs/pi-crew-architecture.md`, `../docs/pi-crew-mvp-plan.md` — các file nằm ngoài package, sẽ broken khi npm publish.
**Đích**: `pi-teams/docs/architecture.md`
**Steps**:
1. Inline nội dung kiến trúc cốt lõi (3 layer: extension/runtime/state, lifecycle diagram, durable run state, autonomous routing).
2. Bỏ reference ra file workspace bên ngoài.
3. Thêm sequence diagram ASCII cho run flow (extension → team-runner → task-runner → child-pi → state).
4. Liên kết tới `usage.md`, `resource-formats.md`, `live-mailbox-runtime.md`, `publishing.md` (đều trong package).
**Acceptance**:
- File ≤600 dòng, không link out-of-package.
- `npm pack --dry-run` ship đầy đủ docs/.
**Verification**: manual review + `npm pack --dry-run`.
---
### Task #78 — `docs/runtime-flow.md` (mới) + sequence diagram
**Lý do**: Onboarding contributor cần một biểu đồ/text mô tả full flow. Hiện rải rác giữa architecture.md, source-runtime-refactor-map.md, refactor-tasks.md.
**Đích**: tạo mới `pi-teams/docs/runtime-flow.md`
**Steps**:
1. ASCII sequence diagram: user → handleTeamTool(run) → executeTeamRun → resolveBatchConcurrency → runTeamTask → runChildPi → child stdout → ChildPiLineObserver → onJsonEvent → updateRunStatus → notify.
2. Bảng "trigger → handler" cho mỗi action (`run`, `resume`, `cancel`, ...).
3. Liệt kê env var ảnh hưởng (`PI_TEAMS_*`, `PI_CREW_*`, `PI_CODING_AGENT_DIR`).
**Acceptance**: Document ≤400 dòng, tự đứng được không cần đọc thêm.
---
## Tier 4 — Tests, smoke, release (P0 cuối phase)
### Task #79 — Integration smoke: Windows process visibility + multi-shard fanout
**Lý do**: Refactor map: "Add real integration smoke scripts for Windows process visibility, async restart recovery, and multi-shard fanout." Test report user vừa gửi đã chứng minh fanout chạy được, nhưng cần script lặp lại được.
**Đích**: `test/integration/`
**Steps**:
1. `test/integration/windows-no-blank-console.test.ts`: spawn `pi --version` qua `pi-spawn.getPiSpawnCommand` với `windowsHide:true` → assert process spawned, no console window (heuristic: `child.spawnargs` không chứa `cmd /c start`).
2. `test/integration/multi-shard-fanout.test.ts`: dùng `expandParallelResearchWorkflow` với fixture `Source/pi-*` mock (5 thư mục dummy) → assert 4 shard sinh ra, mỗi shard có ≥1 path, dependency synthesize đúng tất cả shard.
3. `test/integration/async-restart-recovery.test.ts`: spawn background, kill -9, gọi `team status` → mark failed trong ≤30s (T65 dependency).
**Acceptance**: 3 test pass trên Windows runner CI.
**Verification**: `npm run test:integration`
---
### Task #80 — Update `npm pack --dry-run` snapshot + `schema.json`
**Lý do**: Sau khi thêm config field (T63 `allowUnboundedConcurrency`), `schema.json` exported và `config-schema.ts` cần đồng bộ.
**Đích**: `schema.json`, `src/schema/config-schema.ts`
**Steps**:
1. Regenerate `schema.json` từ TypeBox schema (script `scripts/generate-schema.ts` nếu có; nếu không thì update manually + diff review).
2. `npm pack --dry-run` capture file list, snapshot vào test (`test/unit/package-files.test.ts`).
**Acceptance**: schema.json reflect mọi field config; snapshot test verify không drop file ship.
---
### Task #81 — CHANGELOG + release prep
**Lý do**: Theo AGENTS.md global Section 2, mỗi PR cần Files & Rationale + Tests + Risks/Rollback. Phase 6 sẽ ship qua nhiều mini-release.
**Đích**: `CHANGELOG.md`
**Steps**:
1. Thêm sections theo nhóm Tier:
- `## 0.1.30 — async/concurrency hardening` (T60T63, T79).
- `## 0.1.31 — resume durability + adaptive repair` (T64T67).
- `## 0.1.32 — model routing observability` (T68T69).
- `## 0.2.0 — refactor: subagent runtime + register split` (T70T75) — minor bump vì internal API thay đổi.
- `## 0.2.1 — skills + docs` (T76T78).
2. Mỗi entry follow format: `### Added / Changed / Fixed / Breaking Changes`.
**Acceptance**: CHANGELOG đầy đủ; `npm version` script chạy clean.
---
## Phụ lục A — Acceptance gate cho mỗi mini-release
Trước khi tag/publish:
```bash
# Hard gate
npm run typecheck
npm run test:unit
npm run test:integration
npm pack --dry-run
# Soft gate (manual)
/team-doctor # in Pi smoke session
/team-validate
/team-autonomy status
# Cross-platform
# Trigger CI ubuntu/windows/macos workflow trước khi tag
```
## Phụ lục B — Bảng phụ thuộc giữa task
```
T60 ──► T61 ──► T62
T63 (độc lập) ──┘
T64 ──► T65 ──► T66
T67 (độc lập)
T68 ──► T69
T70 ──► T71 ──► T72
T73 ──► T74 ──► T75 (cần T70 ổn định trước)
T76 (độc lập)
T77 ──► T78
T79 phụ thuộc T63 (concurrency cap), T65 (async-died)
T80 phụ thuộc T63
T81 sau cùng
```
## Phụ lục C — Ánh xạ mỗi task ↔ rủi ro/follow-up đã nêu
| Task | Nguồn yêu cầu |
|---|---|
| T60T62 | Test report risk #2 + Phase analysis "fail-fast nếu jiti fail" |
| T63 | Test report risk #4 |
| T64T66 | Test report risk #5 + refactor map "async restart recovery" |
| T67 | refactor-map "adaptive planner repair/retry" |
| T68T69 | refactor-map "model routing transparency persisted" |
| T70T72 | AGENTS.md "small modules" + analysis "register.ts/team-tool.ts/task-runner.ts cồng kềnh" |
| T73T75 | refactor-map "consolidate subagent runtime into src/subagents/*" |
| T76 | analysis "skills/ trống" |
| T77T78 | analysis "doc kiến trúc trỏ ra ngoài package" + onboarding |
| T79 | refactor-map "real integration smoke scripts" |
| T80T81 | release hygiene |
## Phụ lục D — "Reply with" template cho mỗi PR
Mỗi PR Phase 6 phải tuân thủ AGENTS.md Section 10:
```
Summary: <1 dòng impact>
Plan:
- <bước 1>
- <bước 2>
Files & Rationale:
- src/.../...: <lý do>
Tests:
- <test name>: <kịch bản>
Verification:
- npx tsc --noEmit → Passed
- npm run test:unit → 0 failed / N passed
- npm run test:integration → 0 failed / N passed
- npm pack --dry-run → file list match snapshot
Risks & Rollback:
- <rủi ro>
- <feature flag / revert plan>
Security & Perf Notes:
- <OWASP / RAM / IO>
```
---
**Khuyến nghị triển khai**:
1. Đi theo thứ tự Tier (P0 → P3); không pha trộn refactor lớn (T70T75) với hardening (T60T67).
2. Mỗi Tier ship 1 mini-release để có baseline ổn định trước Tier kế.
3. Trước Tier 3 (T70T75) chạy full test trên CI Windows + macOS để bắt regression cross-platform.
4. Sau mỗi task: chạy `/team-doctor` trong Pi session để smoke; mở dashboard `/team-dashboard` xác nhận không stale.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,100 @@
# Awesome Agent Skills Distillation for pi-crew
Date: 2026-05-05
Source repo: `source/awesome-agent-skills` at `859172a` after fast-forward pull from `VoltAgent/awesome-agent-skills`.
## Source Character
`awesome-agent-skills` is a curated index/README of external agent skills, not a vendored skill-source tree. pi-crew should not copy external skill text from linked repositories. This distillation uses high-level themes from the index plus selected detailed reads of linked skills, rewritten as pi-crew-native workflows rather than vendored text.
## Detailed Links Read
Accessible raw GitHub links inspected:
- `obra/superpowers`:
- `verification-before-completion/SKILL.md` — evidence before claims; fresh command output required.
- `systematic-debugging/SKILL.md` — no fixes without root-cause investigation; four-phase debug loop.
- `subagent-driven-development/SKILL.md` — fresh subagent context, staged review checkpoints, DONE/NEEDS_CONTEXT/BLOCKED handling.
- `requesting-code-review/SKILL.md` — review early/often with explicit base/head context.
- `receiving-code-review/SKILL.md` — verify feedback before implementing; push back with technical evidence.
- `using-git-worktrees/SKILL.md` — detect existing isolation, prefer native worktree tools, verify clean baseline.
- `finishing-a-development-branch/SKILL.md` — verify tests before merge/PR/discard options.
- `test-driven-development/SKILL.md` — red/green/refactor; tests must fail for the intended reason.
- `writing-skills/SKILL.md` — trigger-only descriptions, progressive skill structure, pressure-test skills.
Blocked/unavailable in this environment:
- `officialskills.sh` pages for Trail of Bits/OpenAI returned HTTP 403 when fetched directly.
- Some README paths have moved or are directory-based; missing paths were not treated as source of truth.
Relevant source themes:
- Trail of Bits: clarification, audit context, differential review, insecure defaults, sharp edges, static analysis, testing handbook.
- OpenAI/Sentry/CodeRabbit/Garry Tan: security review, threat modeling, PR/code review, QA, guardrails, release/deploy verification.
- Obra/NeoLab community skills: subagent-driven development, testing with subagents, worktrees, verification before completion, recursive decomposition, review checkpoints.
- Context-engineering entries: context degradation, compression, memory systems, tool design, evaluation frameworks.
- Skill quality standards: specific descriptions, progressive disclosure, no absolute paths, scoped tools.
- Security notice: skills are curated but not audited; external skill content can contain prompt injection, tool poisoning, malware payloads, or unsafe data handling.
## Added pi-crew Skills
### `requirements-to-task-packet`
Purpose: convert ambiguous work into task packets with assumptions, scope, non-goals, acceptance criteria, verification, and escalation conditions.
Primary roles: `analyst`, `planner`.
### `secure-agent-orchestration-review`
Purpose: security-review workflow for delegation, skill loading, tool access, prompts, artifacts, config, and session/state ownership.
Primary role: `security-reviewer`.
### `multi-perspective-review`
Purpose: structured review protocol separating correctness, security, tests, maintainability, operator experience, and compatibility.
Primary roles: `reviewer`, `critic`.
### `verification-before-done`
Purpose: completion gate requiring targeted checks, typecheck/integration/full test escalation, evidence, artifacts, risks, and rollback notes.
Primary roles: `executor`, `test-engineer`, `verifier`.
### `context-artifact-hygiene`
Purpose: prevent context poisoning, lost-in-middle failures, stale artifacts, absolute-path leakage, and poor handoffs.
Primary roles: `explorer`, `writer`.
### `systematic-debugging`
Purpose: reproduce/trace/hypothesize/fix loop for failing tests, blocked runs, config pollution, provider/runtime errors, and stale state.
Not currently default-mapped to avoid skill-budget bloat; can be requested by `skill: "systematic-debugging"` or added to future debug workflows.
## Default Role Mapping Changes
Updated `src/runtime/skill-instructions.ts` to use the new distilled skills while keeping prompt budgets small:
- `explorer`: `read-only-explorer`, `context-artifact-hygiene`
- `analyst`: `read-only-explorer`, `requirements-to-task-packet`
- `planner`: `delegation-patterns`, `requirements-to-task-packet`
- `critic`: `read-only-explorer`, `multi-perspective-review`
- `executor`: `state-mutation-locking`, `safe-bash`, `verification-before-done`
- `reviewer`: `read-only-explorer`, `multi-perspective-review`
- `security-reviewer`: `secure-agent-orchestration-review`, `ownership-session-security`
- `test-engineer`: `verification-before-done`, `safe-bash`
- `verifier`: `verification-before-done`, `runtime-state-reader`
- `writer`: `context-artifact-hygiene`, `verify-evidence`
## Rationale
The selected skills are generic, pi-crew-native, and immediately useful for team orchestration. Vendor/framework-specific skills from the index were intentionally skipped because pi-crew is a TypeScript Pi extension and should not bake in unrelated platform instructions.
## Follow-up Ideas
- Add workflow-level `skills:` defaults for debug/recovery workflows that include `systematic-debugging`.
- Add a `skill-supply-chain-audit` skill if pi-crew later imports external skill bundles automatically.
- Add documentation to README describing `skill` override usage and project `skills/<name>/SKILL.md` overrides.

View File

@@ -0,0 +1,297 @@
# Research: Extension Examples & Patterns
> Ngày: 2026-04-29 | Read-only research | Source: `source/pi-mono/packages/coding-agent/examples/extensions/`
## 1. Example Catalog (86 files, 60+ extensions)
### 1.1 Sorted by relevance to pi-crew
| Priority | Example | Relevance |
|---|---|---|
| ⭐⭐⭐ | `subagent/` | Most similar to pi-crew: child Pi spawning, parallel, chain |
| ⭐⭐⭐ | `custom-compaction.ts` | Hook compaction — useful for preserving run state |
| ⭐⭐⭐ | `event-bus.ts` | Cross-extension communication pattern |
| ⭐⭐⭐ | `plan-mode/` | State persistence, dynamic tools, widget management |
| ⭐⭐⭐ | `structured-output.ts` | `terminate: true` — save LLM turns |
| ⭐⭐ | `handoff.ts` | Context transfer to new session |
| ⭐⭐ | `dynamic-tools.ts` | Register tools at runtime |
| ⭐⭐ | `permission-gate.ts` | Gate dangerous operations |
| ⭐⭐ | `trigger-compact.ts` | Proactive compaction monitoring |
| ⭐⭐ | `send-user-message.ts` | sendUserMessage pattern |
| ⭐ | `dirty-repo-guard.ts` | Guard against uncommitted changes |
| ⭐ | `model-status.ts` | Model status in footer |
| ⭐ | `confirm-destructive.ts` | Confirm destructive operations |
## 2. Deep Analysis of Key Examples
### 2.1 subagent/ — The Reference Implementation
**Files:**
- `index.ts` (~530 dòng): Main tool with execute + render
- `agents.ts` (~130 dòng): Agent discovery (user/project scope)
**Architecture:**
```
subagent tool
├── Single: runSingleAgent() → spawn pi --mode json -p
├── Parallel: mapWithConcurrencyLimit(tasks, 4, runSingleAgent)
└── Chain: sequential loop with {previous} placeholder
```
**Key patterns:**
- Agent discovery: `discoverAgents(cwd, scope)` — scans `.md` files with YAML frontmatter
- Child process: `getPiInvocation()` detects current runtime (node/bun/pi binary)
- Streaming: `onUpdate` callback for partial results during execution
- Render: `renderCall()` + `renderResult()` with collapsed/expanded views
- Abort: AbortSignal propagated to child process
**What pi-crew does better:**
- Durable state (manifest, tasks, events) instead of in-memory only
- Team/workflow abstraction instead of flat agent list
- Task graph with DAG dependencies instead of linear chain
- Async background runner with PID tracking
- Policy engine for limits/retry/escalation
- Mailbox for inter-task communication
- Worktree isolation per task
**What pi-crew could adopt from this:**
- `terminate: true` on final results (not used in example either, but available)
- `renderCall/Result` custom rendering patterns
- `mapWithConcurrencyLimit` pattern (pi-crew already has similar)
### 2.2 custom-compaction.ts — Custom Compaction
**Pattern:**
```typescript
pi.on("session_before_compact", async (event, ctx) => {
// 1. Get preparation data
const { messagesToSummarize, turnPrefixMessages, tokensBefore, firstKeptEntryId } = event.preparation;
// 2. Use different model for summarization (cheaper)
const model = ctx.modelRegistry.find("google", "gemini-2.5-flash");
// 3. Custom prompt
const summary = await complete(model, { messages: [...] }, { apiKey, signal });
// 4. Return custom compaction result
return {
compaction: { summary, firstKeptEntryId, tokensBefore }
};
});
```
**Relevance to pi-crew:**
- Can use cheap model to summarize completed tasks
- Can protect foreground runs from being compacted mid-execution
- Can store structured artifact index in compaction `details`
### 2.3 event-bus.ts — Cross-Extension Communication
**Pattern:**
```typescript
// Extension A: emit events
pi.events.emit("my:notification", { message: "hello", from: "ext-a" });
// Extension B: listen
pi.events.on("my:notification", (data) => {
currentCtx?.ui.notify(`Event from ${data.from}: ${data.message}`);
});
```
**Relevance to pi-crew:**
- Already used for internal events (`subagent.stuck-blocked`)
- Could publish structured events for other extensions to consume:
- `pi-crew:run:completed`
- `pi-crew:subagent:completed`
- `pi-crew:run:failed`
### 2.4 plan-mode/ — State Persistence + Dynamic Tools
**Key patterns:**
State persistence:
```typescript
// Save
pi.appendEntry("plan-mode", { enabled, todos, executing });
// Restore on session_start
const entries = ctx.sessionManager.getEntries();
const state = entries
.filter(e => e.type === "custom" && e.customType === "plan-mode")
.pop()?.data;
```
Dynamic tools:
```typescript
// Switch between tool sets
if (planModeEnabled) {
pi.setActiveTools(["read", "bash", "grep", "find", "ls"]);
} else {
pi.setActiveTools(["read", "bash", "edit", "write"]);
}
```
Tool call gate:
```typescript
pi.on("tool_call", async (event) => {
if (planModeEnabled && event.toolName === "bash") {
if (!isSafeCommand(event.input.command)) {
return { block: true, reason: "..." };
}
}
});
```
**Relevance to pi-crew:**
- `pi.appendEntry` pattern for cross-session run awareness
- `pi.setActiveTools` could be used to restrict tools during team runs
- `tool_call` gate for destructive team actions
### 2.5 structured-output.ts — terminate: true
**Pattern:**
```typescript
async execute(_toolCallId, params) {
return {
content: [{ type: "text", text: "Done" }],
details: { headline, summary, actionItems },
terminate: true, // ← No follow-up LLM turn needed
};
}
```
**Relevance to pi-crew:**
- `Agent` tool results could use `terminate: true` when background run queued
- `get_subagent_result` could terminate when result is final
- `team` tool status/list/recommend actions could terminate
### 2.6 handoff.ts — Context Transfer to New Session
**Pattern:**
```typescript
// 1. Extract conversation context
const messages = ctx.sessionManager.getBranch()
.filter(e => e.type === "message")
.map(e => e.message);
// 2. Generate focused prompt
const prompt = await complete(model, { systemPrompt, messages }, { apiKey });
// 3. Create new session with pre-filled editor
await ctx.newSession({
parentSession: currentSessionFile,
withSession: async (replacementCtx) => {
replacementCtx.ui.setEditorText(prompt);
},
});
```
**Relevance to pi-crew:**
- When a task in a team run needs isolated context, could handoff to new session
- Parent session tracking via `parentSession`
### 2.7 permission-gate.ts — Dangerous Operation Gate
**Pattern:**
```typescript
pi.on("tool_call", async (event, ctx) => {
if (event.toolName !== "bash") return;
if (isDangerousPattern(event.input.command)) {
const choice = await ctx.ui.select("Allow?", ["Yes", "No"]);
if (choice !== "Yes") {
return { block: true, reason: "Blocked by user" };
}
}
});
```
**Relevance to pi-crew:**
- Gate destructive team actions (delete, forget, prune)
- Only allow with explicit `confirm: true` parameter
### 2.8 trigger-compact.ts — Proactive Compaction
**Pattern:**
```typescript
pi.on("turn_end", (_event, ctx) => {
const usage = ctx.getContextUsage();
if (usage?.tokens && usage.tokens > THRESHOLD) {
ctx.compact({ customInstructions: "..." });
}
});
```
**Relevance to pi-crew:**
- Monitor context during long team runs
- Auto-compact before hitting overflow errors
- Use compact's callback to track state
## 3. Pattern Summary
### 3.1 Patterns pi-crew already implements well
| Pattern | pi-crew implementation |
|---|---|
| Child Pi spawning | `SubagentManager` + `spawn.ts` with full process management |
| Parallel execution | `mapConcurrent` in team runner |
| State persistence | Durable file-based (manifest, tasks, events, artifacts) |
| Widget rendering | `CrewWidget`, `LiveRunSidebar`, `Powerbar` |
| Lifecycle hooks | `session_start`, `session_before_switch`, `session_shutdown` |
| Config merge | `loadConfig` with user/project priority |
| Abort propagation | `AbortController` trees in foreground runs |
### 3.2 Patterns pi-crew could adopt
| Pattern | Current status | Recommendation |
|---|---|---|
| `terminate: true` | ❌ Not used | Add to Agent/get_subagent_result |
| `session_before_compact` hook | ❌ Not hooked | Cancel compact during foreground runs |
| Custom compaction model | ❌ Not used | Use Haiku/Gemini Flash for task summaries |
| `pi.events` publish | ⚠️ Internal only | Add public structured events |
| `pi.appendEntry` | ❌ Not used | Cross-session run references |
| `tool_call` permission gate | ❌ Not gated | Gate destructive team actions |
| Config-driven tool registration | ❌ Always all | Register tools per config |
| Working indicator | ❌ Widget only | Use `ctx.ui.setWorkingIndicator` |
| Session name auto-set | ❌ Manual only | Auto-name from team run context |
| `ctx.compact()` proactive | ❌ No monitoring | Monitor + auto-compact at threshold |
## 4. Example: Complete Tool with terminate + render
This shows a hypothetical optimized pi-crew Agent tool:
```typescript
// OPTIMIZED Agent tool pattern
const AgentTool = defineTool({
name: "Agent",
label: "Agent",
description: "Launch a real pi-crew subagent...",
parameters: Type.Object({
prompt: Type.String(),
description: Type.String(),
subagent_type: Type.String(),
run_in_background: Type.Optional(Type.Boolean()),
}),
async execute(_id, params, signal, _onUpdate, ctx) {
// ... spawn subagent ...
if (params.run_in_background) {
return {
content: [{ type: "text", text: `Agent queued. ID: ${record.id}` }],
details: { agentId: record.id, status: "queued" },
terminate: true, // ← No need for LLM follow-up
};
}
await record.promise;
const output = readResult(record);
return {
content: [{ type: "text", text: output }],
details: { agentId: record.id, status: record.status },
terminate: true, // ← Final result, save LLM turn
};
},
renderResult(result, { expanded }, theme) {
// Custom rendering with colored status icons
// Collapsed/expanded views
// Usage stats display
},
});
```

View File

@@ -0,0 +1,324 @@
# Research: Pi Extension System Deep Dive
> Ngày: 2026-04-29 | Read-only research | Source: `source/pi-mono/packages/coding-agent/src/core/extensions/`
## 1. Extension System Architecture
Pi extension system là plugin framework cho coding agent. Extensions được viết bằng TypeScript,
load qua jiti (JIT compiler), và có thể hook vào mọi phase của agent lifecycle.
```
┌─────────────────────────────────────────────────────────────┐
│ ExtensionAPI ("pi.*") │
│ Event sub: pi.on(event, handler) │
│ Tools: pi.registerTool(def) │
│ Commands: pi.registerCommand(name, opts) │
│ Shortcuts: pi.registerShortcut(key, opts) │
│ Flags: pi.registerFlag(name, opts) │
│ Messages: pi.sendMessage() / pi.sendUserMessage() │
│ State: pi.appendEntry(customType, data) │
│ Provider: pi.registerProvider(name, config) │
│ Event bus: pi.events.emit/on() │
│ Model: pi.setModel() / getThinkingLevel() │
│ Tools mgmt: pi.getActiveTools() / setActiveTools() │
├─────────────────────────────────────────────────────────────┤
│ ExtensionFactory │
│ (pi: ExtensionAPI) => void | Promise<void> │
├─────────────────────────────────────────────────────────────┤
│ loader.ts ──► jiti → TypeScript module loading │
│ runner.ts ──► ExtensionRunner → lifecycle + event emit │
│ types.ts ───► 1545 dòng type definitions │
└─────────────────────────────────────────────────────────────┘
```
## 2. Extension Loading Flow
```
discoverAndLoadExtensions(cwd, agentDir, extensionPaths)
├── Scan directories:
│ ├── ~/.pi/agent/extensions/**/index.ts (user-global)
│ ├── .pi/extensions/**/index.ts (project-local)
│ └── CLI --extension paths (explicit)
├── Create ExtensionRuntime (shared state + action stubs)
├── For each extension file:
│ ├── jiti.import(path) # Load TS module
│ ├── Call default export: factory(pi) # Register handlers/tools/commands
│ └── Collect into Extension object
└── Return LoadExtensionsResult
ExtensionRunner.initialize(session, context, actions)
├── Bind real action implementations to runtime
├── Process queued provider registrations
└── Emit session_start event
```
### 2.1 Discovery priority
Project-local > user-global. Extensions cùng tên: project override user.
### 2.2 Runtime replacement (reload)
Khi `/reload` hoặc session switch:
1. `emitSessionShutdownEvent("reload")`
2. Invalidate old ExtensionRuntime (throws if stale extension tries to act)
3. Re-discover + re-load tất cả extensions
4. Re-initialize ExtensionRunner
## 3. Full Event Lifecycle
### 3.1 Event model (23 event types)
**Session events** — session-level lifecycle:
```
session_start ← Khi session được tạo/load/reload
resources_discover ← Extension có thể inject thêm paths
session_before_switch ← Trước khi switch session (có thể cancel)
session_before_fork ← Trước khi fork session (có thể cancel)
session_before_compact ← Trước khi compaction (có thể cancel hoặc custom)
session_compact ← Sau khi compaction hoàn tất
session_before_tree ← Trước khi navigate tree (có thể cancel)
session_tree ← Sau khi navigate tree
session_shutdown ← Khi session bị hủy (quit/reload/new/resume/fork)
```
**Agent events** — per-prompt:
```
input ← Khi user input received (có thể transform/block)
before_agent_start ← Trước khi agent loop chạy (inject custom message / swap system prompt)
context ← Transform messages trước khi gửi LLM
before_provider_request ← Thay đổi payload trước khi gửi provider
after_provider_response ← Quan sát response status/headers
agent_start ← Agent loop bắt đầu
agent_end ← Agent loop kết thúc
```
**Turn events** — per-turn:
```
turn_start ← Bắt đầu turn mới
turn_end ← Kết thúc turn (có message + tool results)
```
**Message events** — per-message:
```
message_start ← Message bắt đầu (user/assistant/toolResult)
message_update ← Streaming token-by-token update
message_end ← Message hoàn tất
```
**Tool events** — per-tool:
```
tool_call ← Trước khi tool execute (có thể block/mutate args)
tool_execution_start ← Tool bắt đầu chạy
tool_execution_update ← Partial/streaming result
tool_execution_end ← Tool hoàn tất
tool_result ← Sau khi tool execute (có thể modify result)
```
**Other:**
```
model_select ← Khi model được chọn/thay đổi
user_bash ← Khi user dùng ! prefix cho bash
```
### 3.2 Event result contracts
Mỗi event có thể return result để ảnh hưởng đến behavior:
| Event | Result type | Effect |
|---|---|---|
| `input` | `{ action: "continue" \| "transform" \| "handled" }` | Transform/block input |
| `before_agent_start` | `{ message?, systemPrompt? }` | Inject custom message, swap system prompt |
| `context` | `{ messages? }` | Replace context messages |
| `before_provider_request` | `any` | Replace payload |
| `tool_call` | `{ block?, reason? }` | Block tool execution |
| `tool_result` | `{ content?, details?, isError? }` | Modify result |
| `user_bash` | `{ operations?, result? }` | Custom bash execution |
| `session_before_*` | `{ cancel? }` | Cancel session operation |
| `session_before_compact` | `{ cancel?, compaction? }` | Cancel or custom compact |
| `session_before_tree` | `{ cancel?, summary?, customInstructions? }` | Cancel or custom summary |
| `resources_discover` | `{ skillPaths?, promptPaths?, themePaths? }` | Inject resource paths |
## 4. Context Objects Available to Extensions
### 4.1 ExtensionContext (`ctx.*`) — có sẵn trong mọi event handler
```typescript
interface ExtensionContext {
ui: ExtensionUIContext; // UI methods (select, confirm, notify, widgets...)
hasUI: boolean; // false in print/RPC mode
cwd: string; // Current working directory
sessionManager: ReadonlySessionManager; // Session access (read-only)
modelRegistry: ModelRegistry; // Auth + model discovery
model: Model<any> | undefined; // Current model
isIdle(): boolean; // Check if agent is streaming
signal: AbortSignal | undefined;// Current abort signal
abort(): void; // Abort current operation
hasPendingMessages(): boolean; // Check message queue
shutdown(): void; // Graceful shutdown
getContextUsage(): ContextUsage | undefined; // Token usage
compact(options?): void; // Trigger compaction
getSystemPrompt(): string; // Current system prompt
}
```
### 4.2 ExtensionCommandContext — extends Context, chỉ trong command handler
```typescript
interface ExtensionCommandContext extends ExtensionContext {
waitForIdle(): Promise<void>; // Wait for agent to finish
newSession(options?): Promise<{cancelled}>;
fork(entryId, options?): Promise<{cancelled}>;
navigateTree(targetId, options?): Promise<{cancelled}>;
switchSession(sessionPath, options?): Promise<{cancelled}>;
reload(): Promise<void>;
}
```
### 4.3 ReplacedSessionContext — sau khi switch/new session
```typescript
interface ReplacedSessionContext extends ExtensionCommandContext {
sendMessage(message, options?): Promise<void>;
sendUserMessage(content, options?): Promise<void>;
}
```
### 4.4 ExtensionUIContext (`ctx.ui.*`) — chỉ khi `hasUI=true`
```typescript
interface ExtensionUIContext {
select(title, options, opts?): Promise<string | undefined>;
confirm(title, message, opts?): Promise<boolean>;
input(title, placeholder?, opts?): Promise<string | undefined>;
notify(message, type?): void;
custom<T>(factory, options?): Promise<T>; // Custom overlay component
setWidget(key, content, options?): void; // Widget above/below editor
setFooter(factory): void; // Custom footer
setHeader(factory): void; // Custom header
setEditorComponent(factory): void; // Custom editor
setStatus(key, text): void; // Status bar
setTitle(title): void; // Terminal title
setWorkingMessage(message?): void; // Working loader text
setWorkingVisible(visible): void; // Show/hide loader
setWorkingIndicator(options?): void; // Custom loader animation
setHiddenThinkingLabel(label?): void; // Thinking block label
onTerminalInput(handler): () => void; // Raw terminal input
getToolsExpanded(): boolean;
setToolsExpanded(expanded): void;
theme: Theme;
getAllThemes(): {name, path}[];
getTheme(name): Theme | undefined;
setTheme(theme): {success, error?};
}
```
## 5. ToolDefinition Contract
```typescript
interface ToolDefinition<TParams extends TSchema, TDetails = unknown, TState = any> {
name: string; // Unique tool name
label: string; // Human-readable for UI
description: string; // For LLM
parameters: TParams; // TypeBox schema
promptSnippet?: string; // 1-line for system prompt "Available tools"
promptGuidelines?: string[]; // Bullets for system prompt "Guidelines"
renderShell?: "default" | "self"; // Who renders the outer frame
executionMode?: "sequential" | "parallel"; // Concurrency control
prepareArguments?: (args: unknown) => Static<TParams>;
// Core execution
execute(
toolCallId: string,
params: Static<TParams>,
signal: AbortSignal | undefined,
onUpdate: AgentToolUpdateCallback<TDetails> | undefined,
ctx: ExtensionContext,
): Promise<AgentToolResult<TDetails>>;
// Rendering (optional)
renderCall?(args, theme, context): Component; // Custom call display
renderResult?(result, options, theme, context): Component; // Custom result display
}
```
### 5.1 `terminate: true` pattern
Tool có thể set `terminate: true` trong result để kết thúc turn ngay sau tool call,
tiết kiệm 1 follow-up LLM turn:
```typescript
return {
content: [{ type: "text", text: "Done" }],
details: { ... },
terminate: true, // ← Kết thúc turn, không cần LLM follow-up
};
```
## 6. Provider Registration
Extension có thể đăng ký provider tùy chỉnh:
```typescript
pi.registerProvider("my-provider", {
baseUrl: "https://api.example.com",
apiKey: "PROVIDER_API_KEY",
api: "anthropic-messages",
models: [{
id: "my-model",
name: "My Model",
reasoning: false,
input: ["text", "image"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 200000,
maxTokens: 16384,
}],
// Optional OAuth:
oauth: {
name: "My Provider (SSO)",
async login(callbacks) { ... },
async refreshToken(credentials) { ... },
getApiKey(credentials) { return credentials.access; },
},
});
```
Hiệu lực ngay lập tức sau `session_start` (không cần `/reload`).
## 7. API Comparison: ExtensionAPI vs ExtensionContext
| Capability | `pi.*` (ExtensionAPI) | `ctx.*` (ExtensionContext) |
|---|---|---|
| Subscribe events | ✅ `pi.on(...)` | ❌ |
| Register tools | ✅ `pi.registerTool()` | ❌ |
| Register commands | ✅ `pi.registerCommand()` | ❌ |
| Register shortcuts | ✅ `pi.registerShortcut()` | ❌ |
| Register flags | ✅ `pi.registerFlag()` | ❌ |
| Register providers | ✅ `pi.registerProvider()` | ❌ |
| Send messages | ✅ `pi.sendMessage()` | ❌ |
| Send user messages | ✅ `pi.sendUserMessage()` | ❌ |
| Append entries | ✅ `pi.appendEntry()` | ❌ |
| Session name | ✅ `pi.setSessionName()` / `getSessionName()` | ❌ |
| Event bus | ✅ `pi.events` | ❌ |
| Get/set active tools | ✅ `pi.getActiveTools()` / `setActiveTools()` | ❌ |
| Get model | ❌ (register-time only) | ✅ `ctx.model` |
| Check idle | ❌ | ✅ `ctx.isIdle()` |
| Abort | ❌ | ✅ `ctx.abort()` |
| Trigger compaction | ❌ | ✅ `ctx.compact()` |
| Context usage | ❌ | ✅ `ctx.getContextUsage()` |
| System prompt | ❌ | ✅ `ctx.getSystemPrompt()` |
| Session manager | ❌ | ✅ `ctx.sessionManager` |
| UI interaction | ❌ | ✅ `ctx.ui` |
| Session control | ❌ | ✅ `ctx.newSession()` / `fork()` (command ctx) |
**Rule of thumb:**
- `pi.*`: Registration-time API (trong factory function, `session_start`)
- `ctx.*`: Runtime API (trong event handlers, command handlers)
## 8. Key Design Decisions
1. **No sandbox** — Extensions run in same Node.js process, full system access
2. **jiti loader** — TypeScript extensions compiled JIT, no build step
3. **Virtual modules** — For Bun compiled binary, built-in dependencies bundled
4. **Throwing stubs** — Runtime actions start as stubs, real implementations bound by runner
5. **Stale detection** — After reload, old extension instances throw on any API call
6. **Event bus** — Separate from extension events, for cross-extension communication

View File

@@ -0,0 +1,322 @@
# oh-my-pi Distillation for pi-crew
Date: 2026-05-05
Source repo: `Source/oh-my-pi` at `1d898a7fe chore: bump version to 14.5.3`.
## Scope Read
Read-only exploration covered four source areas:
- Agent/provider runtime: `packages/agent`, `packages/ai`.
- Main CLI/session/task implementation: `packages/coding-agent`.
- TUI, extensions, hooks, skills, marketplace, rulebook docs and implementation.
- Native/Rust reliability/performance/release docs and implementation.
Representative files and docs inspected:
- `packages/agent/src/agent-loop.ts`, `packages/agent/src/agent.ts`, `packages/agent/src/types.ts`.
- `packages/ai/src/stream.ts`, `packages/ai/src/model-manager.ts`, `packages/ai/src/utils/{abort,retry,event-stream,overflow}.ts`, provider adapters.
- `packages/coding-agent/src/session/*`, `src/extensibility/{hooks,slash-commands,skills,plugins}/*`, `src/task/*`, `src/edit/*`, prompts.
- `packages/tui/src/tui.ts`, `docs/tui*.md`, `docs/extensions.md`, `docs/hooks.md`, `docs/skills.md`, `docs/marketplace.md`, `docs/rulebook-matching-pipeline.md`.
- `crates/pi-natives/src/{task,shell,pty,fs_cache,glob,fd,grep}.rs`, natives docs, install/release scripts.
This document rewrites the useful ideas as pi-crew-native patterns. It does not vendor or copy source code.
## High-Value Patterns to Adopt
### 1. Separate durable run history from provider/model context
oh-my-pi keeps rich internal session messages separate from LLM-compatible provider messages. Custom events, UI messages, hook entries, and branch/compaction entries can live in durable history, while a conversion layer decides what reaches the model.
pi-crew application:
- Keep `TeamRunManifest`, task records, mailbox messages, artifacts, worker events, and review/verification notes as durable run history.
- Add a projection/conversion step before worker prompt/model invocation:
- `transformRunContextBeforeWorkerStart(...)` for pruning/context injection.
- `convertRunHistoryToWorkerPrompt(...)` for provider/child-Pi compatible text.
- Avoid treating UI/runtime events as prompt text by default.
Benefit: safer compaction, mailbox summarization, and artifact hygiene without losing durable audit history.
### 2. Distinguish steering from follow-up
oh-my-pi's agent runtime distinguishes interrupting current work (`steer`) from continuing after the agent would otherwise stop (`followUp`).
pi-crew application:
- Model leader/operator messages as two queues:
- `steeringQueue`: urgent cancellation, nudge, priority change, user answer while worker is active.
- `followUpQueue`: review/verification/documentation after a task reaches a natural stop.
- Default to one-at-a-time delivery to reduce context shock.
- Persist queue entries and delivery status in task mailbox/state.
Benefit: clearer interactive semantics than a single generic respond/resume path.
### 3. Preserve invariants on cancellation and abort
oh-my-pi propagates `AbortSignal` through model streaming and tool execution, distinguishes caller abort from provider-local watchdog abort, and emits synthetic tool results when abort happens after tool calls were started.
pi-crew application:
- Use structured cancel reasons:
- `caller_cancelled`
- `leader_interrupted`
- `provider_timeout`
- `worker_timeout`
- `tool_timeout`
- `shutdown`
- If a worker/tool/action has started but is cancelled, emit a terminal synthetic event/result so task history has no dangling operation.
- Add non-abortable cleanup/finalize phases for artifact preservation and state unlock.
Benefit: fewer stuck `running` tasks and clearer recovery after cancellation.
### 4. Batch-aware execution with shared vs exclusive operations
oh-my-pi marks tools with concurrency semantics: shared tools can run concurrently, exclusive tools serialize around shared/exclusive peers, and queued tools can be skipped when steering arrives.
pi-crew application:
- Classify worker subtasks or internal operations:
- shared: read-only exploration, status, grep, artifact reads.
- exclusive: edits, package manifests, lockfiles, migration/schema updates, worktree merge.
- Attach `batchId`, `index`, `total`, and `conflictKey` metadata to task execution.
- On new steering, skip not-yet-started low-priority operations with explicit skip reason.
Benefit: safer parallelism and more auditable conflict handling.
### 5. Intent tracing for destructive/tool actions
oh-my-pi optionally injects an intent field into tool schemas, strips it before execution, and keeps it for auditability.
pi-crew application:
- Add optional `_intent`/`intent` metadata to worker tool/action events.
- Require intent for destructive actions: cancel, delete, prune, force cleanup, edits, package publish, worktree removal.
- Store intent in events/artifacts but never pass it to low-level execution APIs if not needed.
Benefit: reviewable why/what for high-risk actions without changing execution payloads.
### 6. Event-first UI with tiny component contract and coalesced rendering
oh-my-pi TUI uses small components (`render(width)`, `handleInput`, `invalidate`) and event-driven, coalesced rendering. Components must be width-safe and lifecycle-clean.
pi-crew application:
- Keep dashboards/widgets as projections from snapshot/event state, not direct filesystem scanners.
- Continue using render scheduler/coalescing; add width-safety tests for all dashboard panes/widgets.
- Components should expose `dispose()` for timers/theme subscriptions.
- UI event stream should be semantic (`task_started`, `worker_status`, `mailbox_updated`) rather than raw file polling.
Benefit: avoids UI freezes and makes live views predictable.
### 7. Two-phase extension lifecycle
oh-my-pi extensions have a registration phase where side-effecting runtime methods are unavailable, followed by an initialized phase with real context/actions.
pi-crew application:
- If pi-crew grows plugin/extension support, split APIs into:
- `registerCrewExtension(api)`: declare teams, workflows, hooks, commands, renderers.
- `initializeCrewExtension(context)`: subscribe to events, perform side effects.
- In headless mode, UI APIs should be explicit no-ops or unavailable via `hasUI`.
- Loader should collect extension errors without breaking builtin teams.
Benefit: fewer load-time side effects and safer third-party extensibility.
### 8. Unified capability inventory/control center
oh-my-pi normalizes extensions, skills, rules, tools, hooks, MCPs, prompts, and slash commands into a shared dashboard model with active/disabled/shadowed states.
pi-crew application:
- Extend `/team-settings` or add `/team-control` to show a unified inventory:
- teams, workflows, agents, skills, hooks/policies, tools, runtime providers.
- Normalize each item to:
- `id`, `kind`, `name`, `description`, `source`, `path`, `state`, `disabledReason`, `shadowedBy`, `raw`.
- Persist disables by stable capability ID, not file path.
Benefit: better operator experience for complex multi-resource setups.
### 9. Hooks as typed lifecycle gates, not ad-hoc shell glue
oh-my-pi hooks cover session lifecycle, before-agent-start, tool-call gates, tool-result transforms, and compaction events. Blocking hooks are scoped; non-blocking hook errors are captured but do not crash streaming.
pi-crew application:
- Define typed crew hooks:
- `before_run_start`
- `before_task_start`
- `task_result`
- `before_cancel`
- `before_publish`
- `session_before_switch`
- `run_recovery`
- Mark hooks as blocking or non-blocking.
- Capture hook errors into diagnostics/status, not uncontrolled exceptions.
Benefit: safer customization for policy/security/release gates.
### 10. Prompt pipeline should be explicit
oh-my-pi applies slash/custom commands, templates, compaction, file mentions, hook injection, and model validation in a clear order before calling the agent.
pi-crew application:
Define a worker prompt pipeline:
1. Parse orchestration command/control intent.
2. Expand prompt templates/task packet.
3. Attach selected context/artifact/mailbox summaries.
4. Run `before_worker_start` hooks.
5. Persist exact task packet/artifacts.
6. Launch worker.
Benefit: reproducible worker prompts and easier debugging of context injection.
### 11. Session/run history as append-only tree
oh-my-pi persists session entries with parent relationships. Branching/forking moves the current leaf rather than rewriting past history.
pi-crew application:
- Keep `events.jsonl` append-only and add optional `parentEventId` / `attemptId` / `branchId` fields for retries/forks.
- Represent retry attempts as child branches from the original task prompt/result.
- Preserve old failed attempts instead of overwriting task state only.
Benefit: better auditability and replay/debug of retries.
### 12. Cooperative cancellation token for long loops
oh-my-pi native code uses cancel tokens with deadlines, abort signals, `heartbeat()`, and async wait. Long loops over external-size input must heartbeat at bounded cadence.
pi-crew application:
- Add a TS `CancellationToken` utility for internal long-running loops:
- `heartbeat(stage?: string)`
- `throwIfCancelled()`
- `wait()`
- `abort(reason)`
- Require it in scanners over runs, artifacts, mailboxes, worktrees, and event logs.
Benefit: bounded shutdown/cancel latency and easier stuck-loop diagnostics.
### 13. Process lifecycle: graceful cancel, forced kill, then non-reuse
oh-my-pi shell/PTY runtime cancels gracefully, waits a grace window, forces abort/kill, drains output for bounded windows, and discards persistent sessions after cancellation/errors.
pi-crew application:
- For child Pi workers:
- send graceful abort/TERM;
- wait `graceMs`;
- force-kill process tree;
- drain stdout/stderr for bounded time;
- mark session non-reusable after timeout/protocol error/cancel.
- Return typed status `{ exitCode, cancelled, timedOut, killed, cleanupErrors }`.
Benefit: more deterministic worker cleanup and fewer zombie/stale runs.
### 14. Reserve control channel before async worker start
oh-my-pi PTY reserves its control channel before async process start, rejects duplicate starts, and always clears state in completion.
pi-crew application:
- Install a `WorkerRunCore`/controller synchronously before spawn returns.
- Expose cancel/steer immediately, even while startup is still in progress.
- Clear controller in `finally` and persist terminal state.
Benefit: closes race windows where operator cannot cancel a starting worker.
### 15. Cache scan entries, not final query results
oh-my-pi native search caches directory entries and applies query-specific filters/scoring later. Empty stale caches trigger rescan; ordering is deterministic.
pi-crew application:
- For run/artifact/mailbox discovery, cache raw entries/stats rather than final UI results.
- Apply active-status/mailbox/health filters after cache retrieval.
- Invalidate cache after state mutation.
- Use deterministic sort keys for dashboards and summaries.
Benefit: faster UI/status with fewer stale semantic bugs.
### 16. Blob artifacts and bounded file access
oh-my-pi blob-artifact design uses content addressing, metadata sidecars, streaming writes, size budgets, manifest GC, and path whitelisting.
pi-crew application:
- Introduce content-addressed large artifacts for worker transcripts/screenshots/log chunks.
- Persist metadata sidecars with MIME, source, redaction, run/task IDs, size, hash.
- Keep task prompts/results small by referencing artifact IDs.
- Add GC tied to run retention.
Benefit: avoids bloating task JSON/events and improves artifact security.
### 17. Native/release verification checklist mindset
oh-my-pi release scripts emphasize multi-platform build artifacts, install smoke tests, spoofed-version checks, and runtime loader fallback diagnostics.
pi-crew application:
- For npm releases, keep a release checklist with:
- typecheck;
- unit/integration tests;
- `npm pack --dry-run`;
- install from packed tarball in temp project;
- Pi extension load smoke;
- version/tag/npm consistency check.
Benefit: fewer broken published packages.
## Skill/Rulebook Ideas to Port
oh-my-pi's skills/rulebook ecosystem suggests additional pi-crew resources:
1. `worker-prompt-pipeline` skill: prompt assembly, context projection, before-worker hooks, artifact references.
2. `typed-hook-design` skill: lifecycle gates, blocking vs non-blocking hooks, diagnostics.
3. `process-cancellation-contract` skill: graceful/force kill, synthetic terminal results, non-reuse.
4. `capability-inventory-ux` skill: normalized resource inventory and disable/shadow semantics.
5. `append-only-run-history` skill: event tree, branch/retry provenance.
## Prioritized Backlog for pi-crew
### P0 / High confidence
- Fix current runtime review findings first: waiting final status, respond semantics, no-registry model routing.
- Add structured cancellation reason and terminal synthetic result/event for cancelled workers.
- Centralize worker prompt pipeline and persist exact prompt packets.
- Add width-safety tests for dashboard/widget lines.
### P1 / Medium-term architecture
- Add steering vs follow-up mailbox queues.
- Add typed hook lifecycle for `before_task_start`, `task_result`, `before_cancel`, `session_before_switch`.
- Add capability inventory model for teams/workflows/agents/skills/hooks/tools.
- Add `CancellationToken` for long internal loops and scans.
### P2 / Larger subsystem work
- Append-only run-history tree with attempt/branch parentage.
- Content-addressed blob artifact store with metadata sidecars and GC.
- Worker process controller installed before spawn; process non-reuse after cancel/protocol error.
- Raw scan-entry cache shared by dashboard/status/artifact lookup.
## Anti-Patterns to Avoid
- Building prompts from scattered inline string concatenation without a traceable pipeline.
- Treating UI render as a place to perform heavy filesystem scans.
- Auto-opening modal/right-sidebar UI by default when a compact widget/status line would suffice.
- Dropping queued user-facing results just because session generation changed.
- Cancelling a task without writing a terminal event/result.
- Caching semantic query results that should be recomputed from raw state.
- Letting one bad extension/resource prevent builtin operation.
## Immediate Review Questions for Future Implementation
- Should pi-crew project-local skills be allowed to shadow builtin safety skills by default, or require explicit `project:` namespace?
- Should `respond` enqueue durable work or only deliver to live workers? Current semantics need to become explicit.
- What is the stable capability ID scheme for teams/workflows/agents/skills/hooks?
- Which hook events should be blocking by default and which should be diagnostic-only?
- What artifact size threshold should trigger blob storage instead of embedding content in task/events JSON?

View File

@@ -0,0 +1,548 @@
# Plan: pi-crew Optimization Opportunities
> Ngày: 2026-04-29 | Revised: 2026-04-29 (after design review)
> Based on: research-pi-coding-agent.md, research-extension-system.md, research-extension-examples.md
## Overview
Sau khi đọc sâu extension system của pi-mono và toàn bộ 60+ example extensions, dưới đây là
danh sách cơ hội tối ưu cho pi-crew, được phân loại theo effort và impact.
**Revision notes (2026-04-29):**
- Re-order Phase 1 để compliance-required task (permission gate) đi trước optimization task.
- Tách `terminate: true` thành 2 sub-task vì rủi ro UX khác nhau.
- Hạ "custom compaction model" từ Phase 2 xuống Phase 3 (risk vs ROI).
- Đổi cancel-compaction thành **defer + retry** (tránh context overflow).
- Threshold compaction động theo `contextWindow` thay vì hardcode 150k.
- Thêm rollback strategy ở cấp roadmap + gap research bổ sung.
## Priority Matrix
```
Impact
│ HIGH │ HIGH │
│ Effort │ Effort │
│ LOW │ MEDIUM │
│ ───────┼─────────│
│ MEDIUM │ LOW │
│ Effort │ Effort │
│ LOW │ MEDIUM │
└──────────────────→ Effort
```
## Implementation Status (2026-04-29)
Implemented in code:
- Phase 1.4 permission gate for destructive `team` tool calls.
- Phase 1.6 telemetry baseline fields for subagent completion (`turnCount`, `terminated`, `durationMs`).
- Phase 1.2 compaction guard as defer + retry, moved into `src/extension/registration/compaction-guard.ts`.
- Phase 1.1a `terminate: true` for background/queued subagent launches.
- Phase 1.3 public event bus events (`crew.subagent.completed`, `crew.run.completed`, `crew.run.failed`, `crew.run.cancelled`).
- Phase 1.5 auto session naming for new team runs when no custom session name exists.
- Phase 2.1 proactive compaction with dynamic context-window threshold.
- Phase 2.3 Pi session entries for run start/completion (`crew:run-started`, `crew:run-completed`).
- Phase 2.4 config-driven subagent tool aliases via `config.tools`.
- Phase 2.5 foreground working indicator, using optional API compatibility shim because older `pi-coding-agent` type surfaces may not expose `ctx.ui.setWorkingIndicator`.
- Phase 3.3 safe mailbox event bus publication (`crew.mailbox.message`, `crew.mailbox.acknowledged`).
Deferred by design:
- Phase 1.1b foreground `terminate: true` is implemented as opt-in via `config.tools.terminateOnForeground=true`; default remains safe/off pending telemetry.
- Phase 3.4 structured artifact index is implemented for pi-crew-triggered compactions via `crew:artifact-index` session entries plus compaction custom instructions. Direct `CompactionEntry.details` augmentation is not available through the current upstream extension API without replacing default compaction.
- Phase 3.1, 3.3b, 3.5, and 4.2 are now marked won't-do/research-only after deeper risk/ROI analysis.
- Phase 3.2 remains conditional on agent-level opt-in design. Phase 4.1 remains deferred pending format-compat research.
Validation:
- `npm run typecheck` passes.
- `npm test` passes: 283 unit tests + 26 integration tests.
## Roadmap-level Rollback Strategy
- **1 sub-task = 1 commit** có thể revert độc lập. KHÔNG gộp toàn bộ Phase 1 vào 1 commit.
- Mỗi commit phải có test riêng. Nếu fail trong production, `git revert <sha>` không kéo theo task khác.
- Phase 1.6 (telemetry) làm trước Phase 1.1 để có baseline đo lường.
---
## Phase 1: Quick Wins & Compliance (HIGH impact, LOW effort)
Thời gian ước tính: 2-3 sessions. **Thứ tự đã re-order so với research gốc.**
### 1.4 (FIRST) Permission gate cho destructive team actions
**Lý do làm trước:** AGENTS.md quy định *"Management deletes must require confirm: true; referenced
resources blocked unless force: true"* — đây là **rule bắt buộc**, không phải optimization.
**Files cần sửa:** `src/extension/registration/team-tool.ts` (hoặc file mới)
**Hiện tại:** Có check trong handler nhưng không có `tool_call` hook → message lỗi không nhất quán.
**Tối ưu:**
```typescript
pi.on("tool_call", async (event, ctx) => {
if (event.toolName !== "team") return;
const input = event.input as Record<string, unknown>;
const destructiveActions = ["delete", "forget", "prune", "cleanup"];
if (destructiveActions.includes(input.action as string)) {
if (!input.confirm && !input.force) {
return {
block: true,
reason: `Destructive action '${input.action}' requires confirm=true (or force=true to bypass)`,
};
}
}
});
```
**Note về precedence:** Nếu schema validate đã check `confirm`, **CHỌN 1 chỗ duy nhất**:
- Option A: Để schema validate → bỏ hook (đơn giản hơn).
- Option B: Để hook validate → gỡ check trong handler (consistent error message).
→ Đề nghị Option B vì hook gate tất cả entry points (kể cả nếu sau này có entry point bypass schema).
**Expected benefit:** Compliance với AGENTS.md, safety net production.
---
### 1.6 (NEW) Telemetry baseline cho terminate impact
**Lý do làm trước 1.1:** Plan gốc claim "giảm 30-50% LLM turns" — chỉ là phỏng đoán. Cần baseline đo lường thực tế.
**Files cần sửa:** `src/runtime/subagent-manager.ts`, `src/extension/register.ts`
**Tối ưu:** Log `turnCount` + `terminated: boolean` vào event `crew.subagent.completed`:
```typescript
pi.events.emit("crew.subagent.completed", {
id: record.id,
runId: record.runId,
type: record.type,
status: record.status,
usage: record.usage,
turnCount: record.turnCount, // ← NEW
terminated: record.terminated, // ← NEW (false trước Phase 1.1)
durationMs: record.durationMs, // ← NEW
});
```
**Expected benefit:** Đo trước/sau Phase 1.1 để xác định ROI thực tế. Nếu < 10% turn saving, có thể quyết định không deploy 1.1b.
---
### 1.2 `session_before_compact` guard cho foreground runs (DEFER, không CANCEL)
**Files cần sửa:** `src/extension/register.ts`
**Hiện tại:** Không hook compaction → có thể compact giữa chừng foreground run.
**Tối ưu (revised):** Defer + retry thay vì cancel cứng (tránh context overflow):
```typescript
let pendingCompactReason: string | null = null;
pi.on("session_before_compact", async (event, ctx) => {
if (foregroundControllers.size > 0) {
pendingCompactReason = "deferred-during-foreground-run";
ctx.ui.notify("Compaction deferred until foreground run completes", "info");
return { cancel: true };
}
});
// Retry sau khi run xong:
pi.on("turn_end", (_event, ctx) => {
if (foregroundControllers.size === 0 && pendingCompactReason) {
pendingCompactReason = null;
ctx.compact({
onComplete: () => ctx.ui.notify("Deferred compaction completed", "info"),
});
}
});
```
**Expected benefit:** Ngăn lỗi context mất mát trong foreground run, vẫn đảm bảo compact eventually chạy.
**Risk:** Nếu run cực dài + foregroundControllers chưa bao giờ về 0 → vẫn overflow. Mitigation: hard threshold (vd 95% context window) bypass deferral, force compact.
---
### 1.1a `terminate: true` cho **background queued** results (SAFE)
**Lý do tách:** Background queue không có UX risk, foreground completed có risk (xem 1.1b).
**Files cần sửa:** `src/extension/registration/subagent-tools.ts`
**Tối ưu:**
```typescript
// Agent tool — khi background: terminate ngay sau khi đã queued
if (params.run_in_background) {
return {
...subagentToolResult(...),
terminate: true, // ← Tiết kiệm 1 LLM turn, không có rủi ro UX
};
}
```
**Expected benefit:** Giảm LLM turn cho mọi background spawn. Verify bằng telemetry từ 1.6.
---
### 1.3 Public events qua `pi.events`
**Files cần sửa:** `src/extension/register.ts`
**Hiện tại:** Event bus chỉ dùng cho internal `subagent.stuck-blocked`.
**Naming convention (revised):** Thống nhất với upstream pattern `dot.kebab` (đã dùng cho `subagent.stuck-blocked`):
```typescript
// Document trong README là PUBLIC API:
pi.events.emit("crew.subagent.completed", { ... });
pi.events.emit("crew.run.completed", { runId, team, workflow, status, taskCount, totalUsage });
pi.events.emit("crew.run.failed", { runId, team, workflow, error, failedTaskId });
pi.events.emit("crew.run.cancelled", { runId, team, workflow, status, taskCount });
```
**Versioning:** Note trong README rằng event payload là semver-stable từ pi-crew 0.2.0.
**Expected benefit:** Extension khác (logging, notification, metrics) có thể subscribe.
---
### 1.5 Auto session name từ team run context
**Files cần sửa:** `src/extension/registration/team-tool.ts`
**Tối ưu:**
```typescript
// Trong team tool execute, trước khi start run:
pi.setSessionName(`pi-crew: ${team}/${workflow}${goal.slice(0, 60)}`);
```
**Expected benefit:** Better session organization khi xem session list.
---
### 1.1b (OPT-IN DONE, DEFAULT OFF) `terminate: true` cho **foreground completed** results
**Lý do default off:** UX risk — nếu LLM không có turn để summarize result, user có thể không hiểu output.
**Implementation:** opt-in flag, default safe:
```json
{
"tools": {
"terminateOnForeground": true
}
}
```
When enabled, foreground `Agent`/`crew_agent` completed results set `terminate: true` and persist `record.terminated=true` for telemetry. Decision to make this default-on still requires telemetry evidence:
- Average turn count sau Agent foreground completion ≥ 2.
- Output đã đủ self-explanatory (đo qua user feedback hoặc retry rate).
---
## Phase 2: Medium Effort Optimizations
Thời gian ước tính: 2-3 sessions. (Đã giảm 1 task so với plan gốc.)
### 2.1 Proactive compaction monitoring (DYNAMIC threshold)
**Files cần sửa:** File mới `src/extension/registration/compaction-guard.ts`
**Hiện tại:** Chỉ dựa vào built-in auto-compaction (có thể chậm).
**Tối ưu (revised):** Threshold động theo `contextWindow`:
```typescript
export function registerCompactionGuard(pi: ExtensionAPI) {
const TRIGGER_RATIO = 0.75; // 75% context window → trigger
pi.on("turn_end", (_event, ctx) => {
const usage = ctx.getContextUsage();
const ctxWindow = ctx.model?.contextWindow ?? 200_000;
const threshold = ctxWindow * TRIGGER_RATIO;
if (usage?.tokens && usage.tokens > threshold) {
// Foreground guard từ Phase 1.2 sẽ defer nếu cần
ctx.compact({
customInstructions: "Prioritize keeping team run state, task results, and artifact references. Keep the conversation context brief.",
onComplete: () => ctx.ui.notify("Auto-compacted context during team run", "info"),
onError: (err) => ctx.ui.notify(`Compaction failed: ${err.message}`, "error"),
});
}
});
}
```
**Lý do dùng ratio thay vì hardcode:** Claude Haiku 200k, Gemini Pro 2M, GPT-4o 128k, model nhỏ 32k. Hardcode 150k sai cho 90% trường hợp.
**Expected benefit:** Tránh context overflow error khi foreground run quá dài.
---
### 2.3 `pi.appendEntry` cho cross-session run awareness
**Files cần sửa:** `src/extension/register.ts`
**Tối ưu:**
```typescript
// Khi bắt đầu run:
pi.appendEntry("crew:run-started", {
runId, team, workflow, goal, timestamp: Date.now(),
});
// Khi hoàn thành run:
pi.appendEntry("crew:run-completed", {
runId, status, taskCount, totalUsage, timestamp: Date.now(),
});
```
**Expected benefit:**
- Khi reload session, biết được các run liên quan.
- Session export bao gồm run context.
- Dễ dàng track history.
---
### 2.4 Config-driven tool registration
**Files cần sửa:** `src/extension/registration/subagent-tools.ts`
**Hiện tại:** Luôn register 6 tool variants (Agent, crew_agent, + result + steer).
**Tối ưu:**
```typescript
export function registerSubagentTools(pi: ExtensionAPI, subagentManager: SubagentManager) {
const cfg = loadConfig(pi.getFlag("cwd") as string || process.cwd());
// Conflict-safe tools (luôn register)
pi.registerTool(crewAgentTool);
pi.registerTool(crewAgentResultTool);
// Claude-style aliases: only if not disabled
if (cfg.config.tools?.enableClaudeStyleAliases !== false) {
try { pi.registerTool(agentTool); } catch {}
try { pi.registerTool(getSubagentResultTool); } catch {}
}
// Steer: only if supported
if (cfg.config.tools?.enableSteer !== false) {
try { pi.registerTool(crewAgentSteerTool); } catch {}
try { pi.registerTool(steerSubagentTool); } catch {}
}
}
```
**Expected benefit:** Tránh pollute tool namespace, fine-grained control cho user.
---
### 2.5 Custom working indicator trong foreground runs
**Files cần sửa:** `src/extension/register.ts`
**Tối ưu:**
```typescript
// Khi foreground run active:
ctx.ui.setWorkingIndicator({
frames: ["⣾", "⣽", "⣻", "⢿", "⡿", "⣟", "⣯", "⣷"],
intervalMs: 80,
});
ctx.ui.setWorkingMessage(
`Team run: ${completedTasks}/${totalTasks} tasks done...`
);
// Khi kết thúc:
ctx.ui.setWorkingIndicator(); // Restore default
ctx.ui.setWorkingMessage(); // Clear
```
**Compat shim note:** Implementation dùng optional API compatibility shim:
```typescript
(ctx.ui as { setWorkingIndicator?: (...) => void }).setWorkingIndicator?.(...)
```
Lý do: một số version/type surface của `@mariozechner/pi-coding-agent` chưa expose
`setWorkingIndicator` trên `ExtensionUIContext`. Optional shim giữ backward compatibility và
tránh crash/runtime type mismatch; nếu API không tồn tại thì chỉ bỏ qua custom spinner và vẫn dùng
`setWorkingMessage()`.
**Expected benefit:** Better UX, cho user biết team run đang chạy.
---
## Phase 3: Future Considerations (HIGH effort hoặc Risky)
### 3.1 (WON'T DO unless concrete pain point appears) Branch-level task isolation
Dùng `ctx.fork()` để tạo branch mới cho mỗi task trong team run.
**Decision:** không triển khai mặc định. Worktree isolation đã giải quyết phần quan trọng nhất (file-system/task isolation). Branch-level isolation tạo branch explosion, navigation UX phức tạp, và state-sync risk giữa flat run manifest/tasks/events với Pi session tree. Chỉ reconsider nếu có user complaint cụ thể về context contamination không giải quyết được bằng worktree/dependency-context controls.
### 3.2 Session handoff cho long-running tasks
Khi 1 task quá dài, handoff sang session mới (pattern từ `handoff.ts`), isolate context.
**Conditional trigger:** chỉ enable cho agent/task opt-in, ví dụ agent frontmatter `handoff: true`, hoặc heuristic token estimate > 30% context window.
**Result transport:** child session trả về artifact reference hoặc mailbox message để parent session vẫn aggregate được kết quả mà không cần import toàn bộ transcript.
### 3.3 Mailbox qua `pi.events`
#### 3.3a (DONE) Publish mailbox lifecycle events while preserving file-backed mailbox
Implementation publishes safe public events without changing the durable mailbox source of truth:
```typescript
pi.events.emit("crew.mailbox.message", { runId, id, direction, from, to, taskId, source });
pi.events.emit("crew.mailbox.acknowledged", { runId, messageId, delivery });
```
This keeps file-backed mailbox semantics intact while enabling observers/notification extensions.
#### 3.3b (WON'T DO) Replace file-backed mailbox with pure event-bus mailbox
Thay vì file-based mailbox, dùng event bus làm transport chính cho real-time communication giữa tasks.
**Decision:** won't do. Latency gain is marginal; durability/restart/replay loss is catastrophic for long-running pi-crew runs. 3.3a gives best-of-both-worlds: durable file-backed mailbox remains source of truth, event bus is an observer/notification layer.
### 3.4 (PROMOTED + DONE) Compaction với structured artifact index
Preserve pi-crew artifact references across compaction.
**Implementation:** `compaction-guard.ts` collects recent run artifacts and:
- appends a structured `crew:artifact-index` session entry for machine-readable continuity;
- adds a markdown artifact index to pi-crew-triggered compaction `customInstructions` so the compaction summary preserves run IDs and artifact paths.
**Note:** Directly augmenting `CompactionEntry.details` is not supported by the current upstream `session_before_compact` result contract unless pi-crew replaces default compaction entirely. We intentionally avoid full custom compaction because summary quality/regression risk is higher.
### 3.5 (WON'T DO unless cost telemetry shows pain) Custom compaction với model nhẹ
**Decision:** won't do by default.
- Phụ thuộc vào auth setup của user cho Gemini Flash / Haiku — pi-crew không kiểm soát được.
- Bad summary làm mất context → ảnh hưởng cả run.
- ROI không rõ: compaction chạy không thường xuyên.
Reconsider only if telemetry/user feedback shows compaction cost is a real pain point. Reference remains `examples/extensions/custom-compaction.ts` upstream.
---
## Phase 4 (NEW): Research bổ sung
Hai pattern upstream chưa được khai thác trong plan gốc:
### 4.1 (DEFER — research format compat first) `resources_discover` event integration
Pi-crew có thể inject builtin agents/teams như Pi resources native (skills/prompts):
```typescript
pi.on("resources_discover", () => ({
skillPaths: [path.join(__dirname, "..", "agents")],
promptPaths: [path.join(__dirname, "..", "workflows")],
}));
```
**Decision:** defer. Cần research format compat giữa pi-crew agent markdown vs Pi skill/prompt format trước khi implement. Key risk: dual exposure UX confusion (same capability reachable via `Agent` tool and native skill/prompt) plus loss of pi-crew durable run semantics if exposed as stateless skills.
### 4.2 (RESEARCH-ONLY) `pi.registerProvider` cho virtual "team" model
Đăng ký team như virtual provider để user gọi:
```bash
pi --model crew/researcher
```
Thay vì dùng tool `Agent`.
**Decision:** research-only / not an implementation target. Provider API semantics (single LLM stream, context window, thinking levels, token pricing) do not map cleanly to orchestrator semantics (multi-agent task events, aggregate usage/cost, per-worker contexts). Likely requires upstream provider API changes.
---
## Implementation Order (REVISED)
```
Phase 1 (Quick Wins & Compliance):
[x] 1.4 permission gate destructive team actions ← FIRST (compliance)
[x] 1.6 telemetry baseline ← SECOND (measure first)
[x] 1.2 session_before_compact defer (not cancel)
[x] 1.1a terminate: true on background queued (safe)
[x] 1.3 public crew.* events
[x] 1.5 auto session name
[x] 1.1b terminate: true on foreground (OPT-IN, default off; default-on conditional on telemetry)
Phase 2 (Medium):
[x] 2.1 proactive compaction (dynamic threshold)
[x] 2.3 pi.appendEntry cross-session awareness
[x] 2.4 config-driven tool registration
[x] 2.5 custom working indicator
Phase 3 (Future / Risky):
[-] 3.1 branch-level task isolation (WON'T DO unless concrete pain point appears)
[ ] 3.2 session handoff for long tasks (CONDITIONAL on agent opt-in)
[x] 3.3a publish mailbox lifecycle events (safe subset)
[-] 3.3b replace file-backed mailbox with pure event bus (WON'T DO)
[x] 3.4 structured artifact index in compaction (promoted/done)
[-] 3.5 custom compaction with cheap model (WON'T DO unless cost telemetry shows pain)
Phase 4 (Research):
[ ] 4.1 resources_discover integration (DEFER; format compat research first)
[-] 4.2 virtual team provider (RESEARCH-ONLY)
```
## Files affected
```
PHASE 1:
src/extension/registration/team-tool.ts ← 1.4 permission gate
src/extension/registration/subagent-tools.ts ← 1.1a terminate + 1.1b opt-in terminate
src/extension/register.ts ← 1.2 defer guard, 1.3 events, 1.5 session name
src/runtime/subagent-manager.ts ← 1.6 telemetry fields
PHASE 2:
src/extension/registration/compaction-guard.ts ← NEW: 1.2 defer guard + 2.1 proactive + 3.4 artifact index
src/extension/register.ts ← 2.3 appendEntry, 2.5 working indicator
src/extension/registration/subagent-tools.ts ← 2.4 config-driven
PHASE 3:
src/extension/team-tool/api.ts ← 3.3a mailbox lifecycle events
```
## Risk Assessment (REVISED)
| Change | Risk | Mitigation |
|---|---|---|
| Permission gate (1.4) | Block legitimate use | Allow `force=true` bypass, document trong README |
| Telemetry (1.6) | Privacy / log size | No PII in subagent telemetry payload; opt-out applied via `config.telemetry.enabled=false`; no sampling currently because payload is small/local event-bus data |
| Defer compaction (1.2) | Run dài infinite → overflow | Hard threshold 95% bypass deferral |
| `terminate: true` background (1.1a) | None significant | Background không cần LLM follow-up by design |
| Public events (1.3) | Event storm, breaking change | Rate limit, semver document |
| Auto session name (1.5) | Override user-set name | Applied: chỉ set nếu chưa có name custom (`!pi.getSessionName()`) |
| `terminate: true` foreground (1.1b) | LLM không summarize khi enabled | OPT-IN flag (`config.tools.terminateOnForeground`, default off); default-on requires telemetry evidence |
| Dynamic threshold (2.1) | contextWindow undefined | Default 200_000 fallback |
| Artifact index in compaction (3.4) | Index size bloat / format drift | Cap recent index (10 runs / 80 artifacts), structured `crew:artifact-index` session entry, non-replacing default compaction |
| appendEntry (2.3) | Session bloat | TTL/cleanup strategy |
| Config-driven tools (2.4) | User confused | Default = current behavior, opt-in change |
| Working indicator (2.5) | Conflict với extension khác / older Pi UI type surface | Applied: restore default on finally; compat shim makes `setWorkingIndicator` optional |
| Custom compaction model (3.5) | Bad summary, auth missing | Fall back to default, multi-model retry |
## Testing Strategy
- **Unit tests:**
- `terminate: true` flag in tool results (1.1a/b).
- Permission gate blocks/allows correctly với confirm/force matrix (1.4).
- Threshold calculation từ contextWindow (2.1).
- Telemetry payload schema (1.6).
- Artifact index payload structure + cap behavior (3.4).
- **Integration tests:**
- Foreground run + compaction interaction (1.2 defer + 2.1 trigger).
- Multiple concurrent runs + permission gate (1.4).
- Event publish/subscribe round-trip (1.3).
- Compaction with N artifacts includes artifact index in custom instructions (3.4).
- **Manual:**
- UI behavior với working indicator + session name (1.5, 2.5).
- Real LLM turn count trước/sau 1.1b với telemetry data (1.6 → 1.1b decision).
- **Regression:**
- Run full suite (`npm test`) sau mỗi commit, không gộp Phase.
- Doctor tests phải dùng `--test-timeout=90000` trên Windows.

View File

@@ -0,0 +1,199 @@
# Phase 10: Source Distillation & Development Roadmap
> Synthesized from deep-reads of `pi-mono`, `pi-subagents`, and `pi-crew@melihmucuk` reference fork.
> Date: 2026-05-04
---
## 1. Source Insights
### 1.1 pi-mono (v0.72.1)
| Insight | Impact on pi-crew |
|---|---|
| **Compact read rendering** — AGENTS.md, SKILL.md, Pi docs auto-collapsed in TUI | Our agents' prompts that reference these files still work, but users won't see full content inline. Ensure tool-call descriptions are self-contained. |
| **Session resource cleanup registry** — Providers register cleanup fns; `dispose()` calls all | Our `child-pi.ts` should register cleanup for child processes. Currently we handle SIGINT/beforeExit — align with Pi's new `registerSessionResourceCleanup()`. |
| **Codex WebSocket SSE fallback** — Transparent fallback on WS failure | No direct impact, but note: child Pi processes may switch transports mid-session. |
| **Xiaomi per-region token plan providers** | No impact — provider list is internal to Pi. |
| **Model catalog generator with overrides** | Our `model-fallback.ts` should track new models as Pi adds them. |
### 1.2 pi-subagents (v0.24.0)
| Insight | Impact on pi-crew |
|---|---|
| **Chain directories** — Dedicated `.pi/chains/` and `~/.pi/agent/chains/` | Our workflows are similar but directory-based discovery with `listMarkdownFilesRecursive` is a good pattern. |
| **Supervisor contact** — Children call `contact_supervisor` | Our mailbox system already serves this purpose, but subagent-initiated communication is one-directional. Consider adding `supervisor_contact` event for child→parent. |
| **Model thinking levels** — Respect `thinking` from agent frontmatter | We already have `model-fallback.ts` but don't propagate thinking levels to child Pi. |
| **Session-scoped status** — Filter status by session | Our `run-index.ts` already merges scopes, but individual run status should be session-scoped to avoid cross-contamination. |
| **Foreground kept alive during intercom** | Our `completion-guard.ts` handles some of this, but the pattern of pausing parent while child waits for supervisor is worth aligning. |
| **File-only outputs** — Some subagents only write to files | Our `task-output-context.ts` already supports file-only output extraction. Validate compatibility. |
| **Packaged recursive agents** — Agents can spawn sub-agents | Our task-runner already supports this via child Pi, but we should document the recursive depth guard. |
| **UI simplification** — Removed overlays, consolidated to tool actions | Our dashboard is more advanced but we should ensure TUI simplicity is preserved. |
### 1.3 pi-crew reference fork (melihmucuk v1.0.14)
| Insight | Impact on pi-crew |
|---|---|
| **CrewRuntime singleton** — Process-level, survives session replacement | Our `crew-agent-runtime.ts` is similar but not a true singleton. Consider hardening. |
| **DeliveryCoordinator** — Routes results to owner session, queues when inactive | We lack this pattern. Our result delivery goes through artifacts + notification, but not session-aware routing. |
| **Ownership model**`abortOwned()` returns `{ abortedIds, missingIds, foreignIds }` | Our `cancel.ts` returns `results[]` but doesn't distinguish foreign IDs. Adopt. |
| **Interactive subagents**`interactive: true``waiting` state, `crew_respond`/`crew_done` | We don't have this. Our agents run to completion. Interactive subagents would enable oracle/planner patterns. |
| **Overflow recovery** — Detect context overflow → compaction → auto_retry → recovered, with 120s timeout | We have no overflow recovery. Child Pi processes that hit context limits silently fail. |
| **3-tier agent discovery with JSON overrides** | Our discovery uses teams/agents/workflows with schema validation. JSON overrides for model/thinking/tools are worth adding. |
| **BootstrapSession** — Excludes own extension, uses `SessionManager.create().newSession()` | Our `child-pi.ts` uses `--extension` flags. Align with Pi 0.65+ `session_start` API. |
| **Bundled subagents inherit parent model** | Our `model-fallback.ts` resolves model chain differently. Consider simplifying. |
---
## 2. Distilled Development Axes
### Axis A: Runtime Hardening (Critical)
**A1. Session-aware result delivery**
- Current: Results go to artifacts + notification router
- Target: Add `DeliveryCoordinator` pattern that routes results to the **owner session** specifically, queues when inactive, flushes on `session_start`
- Why: Prevents result loss when a session is replaced/reloaded; matches Pi's lifecycle
**A2. Overflow recovery for child processes**
- Current: Child Pi hitting context limits fails silently or with generic errors
- Target: Detect `agent_end``compaction_start/end``auto_retry_start/end` event sequence; mark task as `"overflow_recovering"``"recovered"` or `"failed"`
- Why: Long tasks with large context currently fail unrecoverably
**A3. Interactive subagent protocol**
- Current: All agents run to completion; no mid-run interaction
- Target: `interactive: true` in agent frontmatter → agent pauses after response, enters `waiting` state; parent sends `crew_respond` to continue, `crew_done` to finalize
- Why: Enables oracle (decision evaluation), planner (multi-turn refinement), and any agent that needs human/team guidance mid-task
**A4. Session resource cleanup alignment**
- Current: SIGINT + beforeExit handlers
- Target: Register cleanup via Pi's `registerSessionResourceCleanup()` when available; fall back to current handlers
- Why: Aligns with Pi's new lifecycle; prevents orphan processes on session reload
### Axis B: Discovery & Configuration (High)
**B1. JSON config overrides for agents/teams**
- Current: Agent frontmatter is the sole source of truth
- Target: `~/.pi/agent/pi-crew.json` (global) and `.pi/pi-crew.json` (project) can override `model`, `thinking`, `tools`, `skills` for any agent
- Why: Per-project model tuning without editing bundled agents; environment-specific tool access
**B2. Thinking level propagation**
- Current: Agent frontmatter has `model` but no `thinking` field
- Target: Add `thinking` to agent schema; propagate to child Pi via `--thinking` flag or session params
- Why: Aligns with Pi's thinking levels; cost control for expensive models
**B3. Parent model inheritance for bundled agents**
- Current: `model-fallback.ts` has a complex chain with config fallbacks
- Target: Simplify: agent frontmatter model → parent session model → config default
- Why: Reduces configuration burden; bundled agents work with whatever model the parent uses
### Axis C: Ownership & Safety (High)
**C1. Foreign-aware ownership model**
- Current: `cancel.ts` returns flat results array
- Target: `cancelOwned(runId, taskIds)` returns `{ abortedIds, missingIds, foreignIds }`; tool responses clearly distinguish "you can't abort foreign tasks"
- Why: Prevents confusion in multi-session scenarios; security improvement
**C2. Supervisor contact event (child→parent)**
- Current: Mailbox is parent→child only; child can write artifacts
- Target: Add `supervisor_contact` event type where child signals "I need a decision" with structured data; parent can respond via mailbox or `steer_subagent`
- Why: Enables interactive subagent protocol (A3); currently children are fire-and-forget
**C3. Session-scoped status filtering**
- Current: `run-index.ts` merges project + user scope runs
- Target: Default status/inspect to session-scoped; cross-scope access only via explicit `scope:` parameter
- Why: Prevents accidental cross-contamination; matches pi-subagents' session scoping
### Axis D: Compatibility & Polish (Medium)
**D1. Compact read rendering awareness**
- Current: Agent prompts reference AGENTS.md, SKILL.md, etc.
- Target: Ensure agent prompts are self-contained enough that collapsed reads don't lose critical instructions; add fallback descriptions in team/workflow frontmatter
- Why: Pi v0.72+ collapses these files in TUI; agents still receive full content via tool calls
**D2. Pi 0.65+ API alignment**
- Current: `child-pi.ts` uses CLI flags (`--model`, `--extension`, etc.)
- Target: When Pi SDK exposes `SessionManager.create()` + `session_start` event in extension API, migrate child session creation to programmatic API
- Why: More reliable than CLI flag parsing; better lifecycle control; Pi is moving toward SDK-first
**D3. UI simplification**
- Current: Full dashboard with 6 panes
- Target: Ensure each pane works as a standalone tool action; no pane depends on another's state. Consider adding compact/expanded modes.
- Why: pi-subagents removed overlays entirely; our dashboard should be usable without full TUI
### Axis E: Observability Gaps (Medium)
**E1. Overflow recovery metrics**
- Add `tasks_overflow_recovering` and `tasks_overflow_recovered` counters to MetricRegistry
**E2. Interactive subagent state tracking**
- Add `tasks_waiting` state to heartbeat/watcher; track wait duration
**E3. Foreign ownership audit logging**
- Log foreign access attempts with session ID; detect potential conflicts
---
## 3. Priority Matrix
| Priority | Item | Axis | Effort | Impact |
|---|---|---|---|---|
| 🔴 P0 | A1: Session-aware result delivery | A | M | High — prevents result loss |
| 🔴 P0 | A2: Overflow recovery for child processes | A | M | High — long tasks currently fail silently |
| 🟡 P1 | C1: Foreign-aware ownership model | C | S | High — security + UX |
| 🟡 P1 | A4: Session resource cleanup alignment | A | S | Medium — aligns with Pi lifecycle |
| 🟡 P1 | B1: JSON config overrides | B | M | Medium — per-project customization |
| 🟡 P1 | B2: Thinking level propagation | B | S | Medium — cost control |
| 🟡 P1 | D1: Compact read rendering awareness | D | S | Medium — compatibility |
| 🟢 P2 | A3: Interactive subagent protocol | A | L | High — enables oracle/planner |
| 🟢 P2 | B3: Parent model inheritance | B | S | Low — simplification |
| 🟢 P2 | C2: Supervisor contact event | C | M | Medium — depends on A3 |
| 🟢 P2 | C3: Session-scoped status | C | S | Low — UX improvement |
| 🟢 P2 | D2: Pi 0.65+ API alignment | D | L | Low — future-proofing |
| 🟢 P2 | D3: UI simplification | D | M | Low — nice to have |
| 🔵 P3 | E1-E3: Observability gaps | E | S | Low — monitoring |
---
## 4. Implementation Order (Proposed)
### Phase 10a: Runtime Hardening (P0 + P1)
1. **A1: DeliveryCoordinator** — session-aware result routing
2. **A2: OverflowRecoveryTracker** — detect context overflow → compaction → retry
3. **C1: Foreign-aware ownership**`abortOwned()` with foreign detection
4. **A4: Session resource cleanup**`registerSessionResourceCleanup()` adapter
### Phase 10b: Discovery & Configuration (P1)
5. **B1: JSON config overrides**`.pi/pi-crew.json` per-project settings
6. **B2: Thinking level propagation**`thinking` frontmatter field
7. **D1: Compact read awareness** — self-contained agent prompts
### Phase 10c: Interactive Protocol (P2)
8. **A3: Interactive subagent**`waiting` state + `crew_respond`/`crew_done` pattern
9. **C2: Supervisor contact event** — child→parent communication channel
10. **B3: Parent model inheritance** — simplified resolve chain
### Phase 10d: Polish & Compatibility (P2-P3)
11. **C3: Session-scoped status** — default filter to session
12. **D3: UI compact/expanded modes** — standalone pane usability
13. **E1-E3: Observability gaps** — overflow, waiting, foreign metrics
14. **D2: Pi 0.65+ API alignment** — programmatic session creation (when SDK available)
---
## 5. Key Code References
| Pattern | Source File | Lines |
|---|---|---|
| Compact read rendering | `pi-mono/packages/coding-agent/src/core/tools/read.ts` | `CompactReadClassification`, `formatCompactReadCall()` |
| Session resource cleanup | `pi-mono/packages/ai/src/session-resources.ts` | `registerSessionResourceCleanup()`, `cleanupSessionResources()` |
| Codex WS SSE fallback | `pi-mono/packages/ai/src/providers/openai-codex-responses.ts` | `isWebSocketSseFallbackActive()` |
| Chain directories | `pi-subagents/src/agents/agents.ts` | `getUserChainDir()`, `resolveNearestProjectChainDirs()` |
| Supervisor contact | `pi-subagents/src/runs/shared/supervisor-contact.ts` | `contact_supervisor` event |
| Thinking levels | `pi-subagents/src/agents/agents.ts` | frontmatter `thinking` field |
| Session scoping | `pi-subagents/src/runs/foreground/foreground-run-queue.ts` | session-scoped filtering |
| CrewRuntime singleton | `pi-crew-ref/extension/runtime/crew-runtime.ts` | Process-level singleton |
| DeliveryCoordinator | `pi-crew-ref/extension/runtime/delivery-coordinator.ts` | Owner-session routing |
| Ownership model | `pi-crew-ref/extension/integration/tools/crew-abort.ts` | `abortOwned()` |
| Interactive subagent | `pi-crew-ref/extension/runtime/subagent-state.ts` | `waiting` state |
| Overflow recovery | `pi-crew-ref/extension/runtime/overflow-recovery.ts` | `OverflowRecoveryTracker` |
| Bootstrap session | `pi-crew-ref/extension/bootstrap-session.ts` | Extension exclusion, parent model |

View File

@@ -0,0 +1,201 @@
# Phase 10+ Deep Distillation — Round 2
**Date**: 2026-05-04
**Sources**: `pi-mono` v0.72.1 (`324aa1d`), `pi-subagents` v0.24.0 (`3ee17de`), `pi-crew` ref v1.0.14 (`c0631a3`)
## Executive Summary
Sau khi deep-read lần 2 vào runtime internals của cả 3 repos, phát hiện **15 insights mới** chưa được implement trong pi-crew. Phân thành 4 axes: Runtime Architecture, Extension API Adoption, Observability/Reliability, và Developer Experience.
---
## Axis F: Runtime Architecture Alignment
### F1. Process-Level Singleton for CrewRuntime ⭐⭐⭐
**Source**: pi-crew ref `crew-runtime.ts`
**Finding**: Module-level singleton (`export const crewRuntime = new CrewRuntime()`) sống xuyên suốt process lifetime. Khi Pi thay extension instance (session switch), singleton vẫn tồn tại vì Node.js module cache. New extension instance chỉ cần gọi `crewRuntime.activateSession(binding)`.
**Current pi-crew**: Mỗi session tạo mới state. Chưa có survive-across-session mechanism.
**Action**: Refactor `SubagentManager` thành process-level singleton với `activateSession()` pattern. In-flight child processes survive session switches.
### F2. Fire-and-Forget Spawn với Immediate ID Return ⭐⭐⭐
**Source**: pi-crew ref `crew-runtime.ts`
**Finding**: `spawn()` tạo state → return ID ngay lập tức → chạy `spawnSession()` async (fire-and-forget). Caller không block.
**Current pi-crew**: `runChildPi` là async block. Task runner phải await.
**Action**: Tách spawn thành sync ID allocation + async execution. Task runner fire-and-forget, poll status qua event log.
### F3. Final Drain Window Pattern ⭐⭐
**Source**: pi-subagents `execution.ts`
**Finding**: Khi `message_end` với `stopReason === "stop"` và không có tool calls → start 1s grace timer → SIGTERM → 3s → SIGKILL. Giúp child process flush output cuối cùng.
**Current pi-crew**: Child Pi timeout đơn giản, không có grace period sau completion signal.
**Action**: Implement `FINAL_STOP_GRACE_MS` drain window trong `child-pi.ts`.
### F4. Atomic JSON Writes cho Status Persistence ⭐⭐
**Source**: pi-subagents `async-execution.ts`
**Finding**: `writeAtomicJson()` ghi file temp → rename. Tránh torn writes khi process crash giữa chừng.
**Current pi-crew**: `JSON.stringify` + `writeFileSync` trực tiếp — rủi ro torn write.
**Action**: Implement `writeAtomicJson()` utility. Apply cho status.json, manifest writes.
### F5. Two-Level Process Hierarchy cho Async ⭐
**Source**: pi-subagents `subagent-runner.ts`
**Finding**: Orchestrator spawn runner (detached) → runner spawn Pi children. Runner track PIDs, write status.json. Orchestrator poll status.json.
**Current pi-crew**: Async run chỉ fire background, không có intermediate runner process.
**Action**: (Low priority) Xem xét thêm intermediate runner cho reliable async tracking.
### F6. Stale Run Reconciler — Three-Phase Pattern ⭐⭐
**Source**: pi-subagents `stale-run-reconciler.ts`
**Finding**: 3-phase: (1) check result file exists → use it, (2) check PID liveness, (3) for dead PIDs → repair immediately, for alive PIDs → fail only if stale > 24h.
**Current pi-crew**: Có `crash-recovery.ts` nhưng chưa có full 3-phase reconciliation.
**Action**: Nâng cấp crash recovery với 3-phase pattern: result-check → PID-check → stale-threshold.
---
## Axis G: Extension API Adoption
### G1. `session_before_compact` Hook — Custom Compaction ⭐⭐⭐
**Source**: pi-mono `extensions/types.ts`
**Finding**: Hook `session_before_compact` returns `{ cancel?, compaction?: CompactionResult }`. Extensions có thể **thay thế hoàn toàn** compaction logic — bao gồm structured details (artifact indices, version markers). Đây là extensibility point mạnh nhất.
**Current pi-crew**: `compaction-guard.ts` chỉ phát hiện compaction events, không can thiệp.
**Action**: Implement `session_before_compact` handler để cung cấp structured compaction thay vì raw text summarization. Preserve team run state across compaction.
### G2. `session_before_switch` Hook — Pre-Switch State Save ⭐⭐
**Source**: pi-mono `extensions/types.ts`
**Finding**: `session_before_switch` fires trước khi Pi switches session (new/resume). Return `{ cancel? }`. Pi-crew có thể save in-memory state → file trước khi switch.
**Current pi-crew**: Không hook vào session switch. State mất khi switch.
**Action**: Hook `session_before_switch` để flush pending deliveries và save subagent state snapshot.
### G3. `resources_discover` Hook — Dynamic Agent/Team Discovery ⭐⭐⭐
**Source**: pi-mono `extensions/types.ts`
**Finding**: `resources_discover` event returns `{ additionalSkillPaths?, additionalPromptPaths?, additionalThemePaths? }`. Extensions có thể dynamically inject resources.
**Current pi-crew**: Discovery chỉ đọc từ filesystem. Không dynamic.
**Action**: Hook `resources_discover` để inject team-specific skills/prompts dựa trên config. VD: auto-inject `safe-bash` skill cho projects có `package.json`.
### G4. `before_agent_start` — System Prompt Override ⭐⭐
**Source**: pi-mono `extensions/types.ts`
**Finding**: Can inject `message` and/or override `systemPrompt` before agent loop begins. Powerful for child agents.
**Current pi-crew**: Child Pi system prompt built từ task packet, không override qua hook.
**Action**: (Low priority — already handled via task packet prompt builder)
### G5. `tool_result` Event — Post-Execution Output Modification ⭐
**Source**: pi-mono `extensions/types.ts`
**Finding**: Can modify tool output `content`, `details`, `isError` after execution. Useful for enrichment/filtering.
**Current pi-crew**: Không hook vào tool results.
**Action**: Hook `tool_result` cho `team` tool để enrich output với structured metadata (run URL, artifact count, duration).
### G6. `input` Event — User Input Interception ⭐
**Source**: pi-mono `extensions/types.ts`
**Finding**: Can transform user input text/images or fully handle it (`action: "continue" | "transform" | "handled"`).
**Current pi-crew**: Không intercept user input.
**Action**: Hook `input` để detect `@team-name` mentions → auto-route to team run.
---
## Axis H: Observability & Reliability Gaps
### H1. Completion Mutation Guard ⭐⭐
**Source**: pi-subagents `completion-guard.ts`
**Finding**: Sau khi subagent trả về "success", check xem nếu task là "implementation" nhưng **không có file edits** → mutate completion thành warning. Tránh false-positive completions.
**Current pi-crew**: Task complete khi child Pi exits 0. Không verify actual work done.
**Action**: Implement completion guard: verify artifacts exist, files changed, hoặc output non-trivial.
### H2. Snapshot-Before-Emit Pattern ⭐
**Source**: pi-subagents `execution.ts`
**Finding**: Progress object snapshotted (spread) trước mỗi `onUpdate` callback. Tránh mutation during callback.
**Current pi-crew**: Task state mutated directly, events emit references.
**Action**: Snapshot task state trước khi emit events để avoid race conditions.
### H3. Intercom Bridge với Delivery Confirmation ⭐⭐
**Source**: pi-subagents `intercom-bridge.ts`
**Finding**: Bidirectional intercom: `deliverSubagentResultIntercomEvent()` emit event → wait for confirmation với 500ms timeout. Agent injection pattern: mutate config để add `contact_supervisor` tool + instructions.
**Current pi-crew**: Có `supervisor-contact.ts` parse từ stdout, nhưng không có bidirectional confirmation.
**Action**: Nếu Pi expose intercom API, upgrade supervisor contact thành bidirectional với delivery confirmation.
### H4. writeAtomicJson Utility ⭐⭐
**Source**: pi-subagents (pervasive)
**Finding**: Atomic file writes used everywhere: status, manifest, results. Pattern: `writeFileSync(path + ".tmp", data) → renameSync(path + ".tmp", path)`.
**Action**: Shared utility trong `src/utils/atomic-write.ts`.
---
## Axis I: Developer Experience
### I1. Tool Presentation — Emoji + Grouping ⭐
**Source**: pi-crew ref `tool-presentation.ts`
**Finding**: `crew_spawn` renders "🚀 Spawning {agent}...", `crew_respond` renders "💬 Sending response...". Grouped tool calls have custom collapse UI.
**Current pi-crew**: Tool output plain text.
**Action**: Add emoji prefixes và structured formatting cho tool output.
### I2. renderCall/renderResult cho Team Tool ⭐⭐
**Source**: pi-mono `tools/index.ts`
**Finding**: `ToolDefinition` supports `renderCall``renderResult` callbacks returning TUI Components. Allows rich rendering in Pi terminal UI.
**Current pi-crew**: Không có custom renderers.
**Action**: Implement `renderCall` cho `team` tool để show spinner/agent-list thay vì raw JSON. Implement `renderResult` để show summary dashboard.
### I3. Prompt Snippet + Guidelines trong Tool Definition ⭐
**Source**: pi-mono `tools/index.ts`
**Finding**: `promptSnippet` — one-liner in system prompt. `promptGuidelines` — bullets appended to system prompt. Tools without `promptSnippet` are excluded from LLM awareness.
**Current pi-crew**: Tool description chỉ trong JSON schema description.
**Action**: Khi Pi hỗ trợ `promptSnippet`/`promptGuidelines` trong custom tools, adopt để improve LLM tool usage.
---
## Priority Matrix
| ID | Feature | Impact | Effort | Priority |
|---|---|---|---|---|
| F1 | Process-level singleton | High | High | P1 |
| F2 | Fire-and-forget spawn | Medium | Medium | P2 |
| F3 | Final drain window | Medium | Low | P2 |
| F4 | Atomic JSON writes | High | Low | P1 |
| F5 | Two-level async hierarchy | Low | High | P3 |
| F6 | 3-phase stale reconciliation | Medium | Medium | P2 |
| G1 | Custom compaction hook | High | Medium | P1 |
| G2 | Pre-switch state save | Medium | Low | P2 |
| G3 | Dynamic resource discovery | High | Medium | P1 |
| G4 | System prompt override | Low | Low | P3 |
| G5 | Post-execution output mod | Low | Low | P3 |
| G6 | User input interception | Medium | Medium | P3 |
| H1 | Completion mutation guard | High | Low | P1 |
| H2 | Snapshot-before-emit | Medium | Low | P2 |
| H3 | Bidirectional intercom | Medium | High | P3 |
| H4 | writeAtomicJson utility | High | Low | P1 |
| I1 | Tool presentation emojis | Low | Low | P3 |
| I2 | Custom TUI renderers | High | High | P2 (when API available) |
| I3 | Prompt snippet/guidelines | Medium | Low | P3 (when API available) |
---
## Recommended Implementation Order
### Phase 11a: Reliability Foundations (F4 + H4 + H1 + H2)
- `src/utils/atomic-write.ts` — writeAtomicJson utility
- Apply atomic writes to all manifest/state writes
- Completion mutation guard for task results
- Snapshot-before-emit for task state events
### Phase 11b: Extension API Hooks (G1 + G2 + G3)
- `session_before_compact` handler — structured compaction
- `session_before_switch` handler — pre-switch state flush
- `resources_discover` handler — dynamic skill/prompt injection
### Phase 11c: Runtime Architecture (F1 + F2 + F3)
- Refactor SubagentManager → process-level singleton
- Fire-and-forget spawn pattern
- Final drain window for child process cleanup
### Phase 11d: Reconciliation & Recovery (F6 + H3)
- 3-phase stale run reconciliation
- Upgrade supervisor contact toward bidirectional (if API available)
---
## Already Implemented (Phase 10a-10d) ✅
- DeliveryCoordinator (session-aware routing with queue/flush)
- OverflowRecoveryTracker (compaction → retry state machine)
- Foreign-aware cancel (ownership detection)
- Session resource cleanup adapter
- Interactive subagent waiting state + respond action
- Supervisor contact parsing from child stdout
- Parent model inheritance
- Session-scoped run listing
- Observability metrics for overflow/waiting/supervisor
- Skills override + .pi/pi-crew.json config path

View File

@@ -0,0 +1,819 @@
# Phase 8 — Operator Experience: Interactive Mailbox, Health Pane, Smart Notifications
> Tiếp nối tự nhiên của Phase 7 (UI Optimization). Mục tiêu: biến dashboard từ "viewer" thành "operator console" — actions thực hiện được trực tiếp từ UI, không phải toggle CLI. Path X chosen (Phase 8 = Theme A, Phase 9 = Theme B+C Observability+Reliability deferred).
**Open Questions Resolution (Q1-Q6 đã chốt — xem Section 7 chi tiết):**
- Q1=(b) Có preview compose pane | Q2=(c) Sink JSONL khi `telemetry.enabled` | Q3=(b) Cross-day quiet-hours wrap
- Q4=(c) Full action menu R/K/D trên health pane | Q5=(c) Confirm chỉ destructive | Q6=(a) ESC discard + confirm-if-long guard
## 0. Implementation Status
- [x] 8.0 Foundation: keybinding contract + action dispatcher + RunActionResult shape + ConfirmOverlay primitive
- [x] 8.1.A Mailbox detail overlay (passive list view, no actions yet)
- [x] 8.1.B Mailbox ack action (hotkey `A` trên message đang chọn)
- [x] 8.1.C Mailbox nudge action (hotkey `N` + agent picker)
- [x] 8.1.D Mailbox compose action (hotkey `C` + form overlay) — Q6: ESC discard + confirm-if-long (>50 chars)
- [x] 8.1.E Mailbox compose preview pane (key `P` toggle, render markdown read-only) — Q1
- [x] 8.1.F Mailbox ackAll destructive action (hotkey `Shift+X`) — Q5: requires confirm overlay
- [x] 8.2.A Heartbeat aggregator (`heartbeat-aggregator.ts`)
- [x] 8.2.B Health pane (pane index `5`) trong dashboard
- [x] 8.2.C Auto-recovery prompt (stuck worker > N minutes → toast + confirm) — throttled 5min/run
- [x] 8.2.D Health pane action menu — `R` recovery (foreground only), `K` kill stale workers, `D` diagnostic export — Q4
- [x] 8.3.A Notification router (severity classifier + dedup window)
- [x] 8.3.B Notification quiet-hours (cross-day wrap parser) + batching config — Q3
- [x] 8.3.C Toast badge counter trong widget/powerbar (đếm số notification chưa ack)
- [x] 8.3.D Notification JSONL sink rotate 7 ngày, gated bởi `telemetry.enabled` — Q2
- [x] 8.4 Wire `register.ts` + `commands.ts`
- [x] 8.5 Tests: unit + integration
## 1. Roadmap-Level Decisions
| # | Decision | Chosen | Rationale |
|---|---|---|---|
| D1 | Mailbox actions chạy trực tiếp hay dispatch về team API? | **Dispatch** qua `handleTeamTool({action:"api", config:{operation:...}})` | Tận dụng API hiện có (`ack-message`, `send-message`, `nudge-agent`); zero state-machine duplication; locks/events được giữ nguyên |
| D2 | Overlay form vs inline edit? | **Overlay form** (modal-like, anchor center) | Dashboard sidebar quá hẹp cho text input; overlay tách biệt focus; ESC dễ cancel |
| D3 | Health pane là pane mới (`5`) hay tab trong progress? | **Pane mới `5`** | Tránh pollute progress pane; cho user toggle độc lập; consistent với existing 1-4 |
| D4 | Notification sink: optional opt-in hay default-on? | **Default-on khi `telemetry.enabled !== false`** (Q2=c) | Đồng nhất pattern Phase 6 telemetry; debug-friendly; user opt-out qua telemetry config chung. Path: `<crewRoot>/state/notifications/{YYYY-MM-DD}.jsonl`, rotate 7 ngày |
| D5 | Quiet-hours format + cross-day? | **HH:MM-HH:MM trong config local timezone, support cross-day wrap** (Q3=b) | Single range `"22:00-07:00"` parser tự nhận diện wrap-around; intuitive vs multi-range array |
| D6 | Compose-form fields scope? | **Phase 8: from/to/body/taskId + preview pane** (Q1=b) | Preview key `P` toggle render markdown read-only; thread/attachment defer Phase 9 |
| D7 | Action mới có break keybinding cũ? | **No** — phím mới: `A/N/C/P/Shift+X` (mailbox), `R/K/D` (health), `H/X` (notification); phím hiện hành (`s/u/a/i/d/m/e/o/v/r/p/1-4/k/j`) giữ nguyên (lowercase `r` vẫn = reload root, uppercase `R` = recovery in health pane only) | Backward-compat; context-scoped uppercase |
| D8 | Mailbox detail panel: inline expand hay separate overlay? | **Separate overlay** (mở khi nhấn Enter trên pane mailbox) | Pane chính giữ nguyên density; overlay scrollable |
| D9 | Health pane action mode: prompt-only vs full menu? | **Full action menu (Q4=c)**: `R` recovery (foreground-only), `K` kill stale workers, `D` diagnostic export | Operator power-user toolkit; async runs `R/K` disabled with hint; `D` cực hữu ích cho bug report |
| D10 | Foundation 8.0: tách RunActionDispatcher hay inline? | **Tách module** `src/ui/run-action-dispatcher.ts` | Reuse cho overlay con; dễ test; không bloat dashboard |
| D11 | Compose ESC behavior? | **Discard + confirm-if-long** (Q6=a) | ESC không lưu draft; nếu body > 50 ký tự → confirm overlay `Y=discard, N=continue editing`; defer draft persistence Phase 9 |
| D12 | Confirm overlay: per-action ad-hoc hay reusable primitive? | **Reusable primitive** `src/ui/overlays/confirm-overlay.ts` | Q5=c destructive (ackAll/recovery/diagnostic-export-with-secrets) cần consistent UX; reuse cho mọi confirm |
| D13 | Auto-recovery throttle window? | **5 phút/run/condition-type** | Tránh notification storm khi run dead lâu; `recovery_dead_workers` riêng biệt với `recovery_missing_heartbeat` |
| D14 | Diagnostic export `D` format & destination? | **JSON + redact secrets** vào `<crewRoot>/artifacts/{runId}/diagnostic-{timestamp}.json` | Self-contained snapshot (manifest + tasks + recent events + heartbeat summary); confirm before write nếu artifact-dir đã có file diag cũ < 1 phút |
| D15 | Preview pane render scope (Q1=b)? | **Read-only markdown render**: bold/italic/code-block/list — no images/links | Đủ cho operator đọc nội dung trước khi gửi; không cần markdown engine đầy đủ; reuse từ existing transcript-viewer markdown helper nếu có |
## 2. Phase Breakdown
### Phase 8.0 — Foundation (2 dev-day, +0.5 cho ConfirmOverlay)
**File mới:**
- `src/ui/run-action-dispatcher.ts` — wrapper gọi `handleTeamTool` với `runId` + `operation`, normalize result thành `{ ok, message, data }`.
- `src/ui/keybinding-map.ts` — central registry mapping `data` (raw stdin) → action name; export `KEY_RESERVED` để overlay con check conflict.
- `src/ui/overlays/confirm-overlay.ts`**(Q5)** reusable confirm primitive, anchor center, auto-focus `N` (safe default), Y/Enter=confirm, N/ESC=cancel. ~80 LOC.
**Sửa:**
- `src/ui/run-dashboard.ts` — refactor `handleInput` dùng `keybinding-map`; không thay đổi behavior cũ.
**Skeleton:**
```ts
// run-action-dispatcher.ts
import type { ExtensionContext } from "@mariozechner/pi-coding-agent";
import { handleTeamTool } from "../extension/team-tool.ts";
export interface RunActionResult {
ok: boolean;
message: string;
data?: unknown;
}
export async function dispatchMailboxAck(ctx: ExtensionContext, runId: string, messageId: string): Promise<RunActionResult> {
try {
const r = await handleTeamTool({ action: "api", runId, config: { operation: "ack-message", messageId } }, ctx);
return { ok: r.metadata?.status === "ok", message: r.text, data: r };
} catch (error) {
return { ok: false, message: error instanceof Error ? error.message : String(error) };
}
}
export async function dispatchMailboxNudge(ctx: ExtensionContext, runId: string, agentId: string, message: string): Promise<RunActionResult> { /* ... */ }
export async function dispatchMailboxCompose(ctx: ExtensionContext, runId: string, payload: { from: string; to: string; body: string; taskId?: string; direction: "inbox" | "outbox" }): Promise<RunActionResult> { /* ... */ }
export async function dispatchMailboxAckAll(ctx: ExtensionContext, runId: string): Promise<RunActionResult> { /* read-mailbox → loop ack-message */ }
export async function dispatchHealthRecovery(ctx: ExtensionContext, runId: string): Promise<RunActionResult> { /* foreground-interrupt API */ }
export async function dispatchKillStaleWorkers(ctx: ExtensionContext, runId: string): Promise<RunActionResult> { /* mark dead heartbeats; emit event */ }
export async function dispatchDiagnosticExport(ctx: ExtensionContext, runId: string): Promise<RunActionResult> { /* read-manifest + list-tasks + read-events limit=200 + heartbeat summary → write artifact */ }
```
```ts
// keybinding-map.ts (Q4 + Q5 expanded)
export const DASHBOARD_KEYS = {
close: ["q", "\u001b"],
select: ["\r", "\n", "s"],
pane: { agents: ["1"], progress: ["2"], mailbox: ["3"], output: ["4"], health: ["5"] },
// Mailbox detail overlay context
mailbox: { ack: ["A"], nudge: ["N"], compose: ["C"], preview: ["P"], ackAll: ["X"], openDetail: ["\r", "\n"] },
// Health pane context (Q4=c full menu)
health: { recovery: ["R"], killStale: ["K"], diagnosticExport: ["D"] },
// Notification context
notification: { dismissAll: ["H"] }, // 'H' for Hush
} as const;
```
```ts
// confirm-overlay.ts
export interface ConfirmOptions {
title: string;
body?: string;
dangerLevel?: "low" | "medium" | "high"; // colors theme accent
defaultAction?: "confirm" | "cancel"; // default "cancel"
}
export class ConfirmOverlay {
constructor(private opts: ConfirmOptions, private done: (confirmed: boolean) => void, private theme: unknown) {}
render(width: number): string[] { /* anchor-center box, dim Y/N hint */ }
handleInput(data: string): void {
if (data === "y" || data === "Y" || data === "\r" || data === "\n") return this.done(true);
if (data === "n" || data === "N" || data === "\u001b" || data === "q") return this.done(false);
}
}
```
**Tests:**
- `test/unit/run-action-dispatcher.test.ts` (7 test cases — 4 mailbox dispatchers + 3 health dispatchers, mock `handleTeamTool`).
- `test/unit/confirm-overlay.test.ts` (4 cases: render, Y confirms, N cancels, default cancel safety).
---
### Phase 8.1 — Mailbox Interactivity
#### 8.1.A Mailbox detail overlay (1 dev-day)
**File mới:**
- `src/ui/overlays/mailbox-detail-overlay.ts` — class `MailboxDetailOverlay` implement Pi UI custom widget; render 2-column (inbox | outbox); ↑/↓ select; Enter expand body; ESC/q close.
**Cập nhật:**
- `src/ui/dashboard-panes/mailbox-pane.ts` — line cuối cùng đổi từ "use /team-api ..." thành `"Press Enter on mailbox pane to open detail (A=ack, N=nudge, C=compose)"`.
- `src/ui/run-dashboard.ts` — khi `activePane === "mailbox"` và user nhấn Enter, return `{action: "mailbox-detail"}` thay vì close.
- `src/extension/registration/commands.ts` — handle `selection.action === "mailbox-detail"` → mở `MailboxDetailOverlay` qua `ctx.ui.custom`.
**Skeleton:**
```ts
export class MailboxDetailOverlay {
private inbox: MailboxMessage[] = [];
private outbox: MailboxMessage[] = [];
private selected = 0;
private side: "inbox" | "outbox" = "inbox";
constructor(private opts: { runId: string; cwd: string; ctx: ExtensionContext; done: (sel?: MailboxAction) => void; theme: unknown }) {
this.refresh();
}
private refresh(): void { /* read mailbox via team api */ }
render(width: number): string[] { /* 2-col layout, highlight selected */ }
handleInput(data: string): void { /* arrow nav, A/N/C dispatch via this.opts.done */ }
}
export interface MailboxAction {
type: "ack" | "nudge" | "compose" | "reply";
messageId?: string;
agentId?: string;
}
```
**Tests:** `test/unit/mailbox-detail-overlay.test.ts` — 4 cases (render empty, render with items, key navigation, action dispatch).
#### 8.1.B Ack action (0.75 dev-day)
**Logic:** trong `MailboxDetailOverlay.handleInput`, key `A` (uppercase, để tránh conflict với `a`=artifacts ở dashboard root) → `done({type:"ack", messageId: selectedMessage.id})`.
**Update `commands.ts`:** sau khi overlay close, nếu action.type === "ack" → call `dispatchMailboxAck(ctx, runId, action.messageId!)` → toast result.
**Acceptance:** ack thành công → mailbox pane re-render với attention count giảm trong < 250ms (snapshot cache invalidate khi `crew.mailbox.acknowledged` event).
#### 8.1.C Nudge action (0.75 dev-day)
**Logic:** key `N` → mở agent picker overlay (reuse pattern từ existing `LiveRunSidebar`); chọn xong → message input → dispatch `dispatchMailboxNudge`.
**File mới:** `src/ui/overlays/agent-picker-overlay.ts` (nhỏ, 80-120 LOC).
**Acceptance:** nudge → `crew.mailbox.message` event fire → snapshot invalidate → mailbox pane attention count tăng đúng.
#### 8.1.D Compose form (1.25 dev-day)
**File mới:** `src/ui/overlays/mailbox-compose-overlay.ts` — form 4 field (from/to/body/taskId), Tab navigation, Enter submit, ESC cancel.
**Behavior chi tiết (Q6=a):**
- Tab/Shift+Tab: cycle giữa các field.
- Body multi-line: Ctrl+Enter → newline; Enter trên field body với content non-empty → submit.
- ESC khi body ≤ 50 ký tự → discard immediately, close overlay.
- ESC khi body > 50 ký tự → mở `ConfirmOverlay` với title `"Discard draft?"` body `"Body has N chars. Y=discard, N=continue editing"`. Cancel default = continue editing (safe).
- Submit validate: body required (non-whitespace), to required, from default `"operator"` if empty.
- Direction toggle: Tab vào checkbox `[ ] Send to outbox` → Space toggle.
**Logic dispatch:** `dispatchMailboxCompose` với `direction` từ checkbox (default `"inbox"` — operator gửi vào inbox của run).
**Tests:** `test/unit/mailbox-compose-overlay.test.ts` — 8 cases (render, tab nav, ESC short discard, ESC long → confirm overlay, confirm overlay cancel = stay editing, confirm overlay confirm = discard, Enter submit, validation empty body, validation empty to).
#### 8.1.E Compose preview pane (0.75 dev-day) — Q1=b
**File mới:** `src/ui/overlays/mailbox-compose-preview.ts` — read-only render markdown của body field hiện tại; share state với `mailbox-compose-overlay.ts`.
**Layout:** compose overlay split horizontal khi preview active — 60% form / 40% preview pane (pane render markdown read-only, không cho focus).
**Render scope (D15):** bold (`**`), italic (`*`), code-block (`` ``` ``), inline code (`` ` ``), unordered list (`-`), numbered list (`1.`), heading (`#`/`##`/`###`). Skip images/links (out of scope; render link text only).
**Behavior:**
- Key `P` toggle preview on/off (state in compose overlay).
- Preview cập nhật real-time khi body thay đổi (debounce 100ms để tránh re-render mỗi keystroke).
- Khi preview active, header help line update: `"P close preview · Tab cycle · Enter submit · ESC discard"`.
**Skeleton:**
```ts
// mailbox-compose-preview.ts
export function renderComposePreview(body: string, width: number, theme: CrewTheme): string[] {
const tokens = tokenizeMarkdown(body); // simple tokenizer ~80 LOC
return tokens.flatMap((t) => renderToken(t, width, theme));
}
function tokenizeMarkdown(body: string): MdToken[] { /* line-by-line scan */ }
type MdToken = { type: "heading" | "code-block" | "list-item" | "paragraph"; level?: number; text: string };
```
**Tests:** `test/unit/mailbox-compose-preview.test.ts` — 6 cases (plain text, bold/italic, code block, list, heading, mixed content).
#### 8.1.F Mailbox ackAll (0.5 dev-day) — Q5=c destructive
**Logic:** trong `MailboxDetailOverlay.handleInput`, key `Shift+X` (raw stdin `"X"` uppercase) → mở `ConfirmOverlay`:
- Title: `"Acknowledge all N unread messages?"`
- Body: `"This cannot be undone. Y=ack all, N=cancel."`
- DangerLevel: `"medium"`.
Confirm `Y` → `dispatchMailboxAckAll(ctx, runId)` (dispatcher loop ack-message từng id) → toast result `"Acknowledged N messages."`.
**Acceptance:** ackAll trong run với 10 unread → all marked acknowledged trong < 2s; mailbox pane attention → 0; emit 10x `crew.mailbox.acknowledged` event.
**Tests:** `test/unit/mailbox-detail-overlay.test.ts` thêm 3 cases (Shift+X opens confirm, confirm Y dispatches loop, confirm N stays).
---
### Phase 8.2 — Health Pane & Recovery
#### 8.2.A Heartbeat aggregator (1 dev-day)
**File mới:** `src/ui/heartbeat-aggregator.ts`
```ts
export interface HeartbeatSummary {
runId: string;
totalTasks: number;
healthy: number; // alive=true, lastSeenAt < threshold
stale: number; // lastSeenAt > stale threshold (default 60s)
dead: number; // lastSeenAt > dead threshold (default 5min) hoặc alive=false
missing: number; // task running nhưng no heartbeat record
worstStaleMs: number;
}
export function summarizeHeartbeats(snapshot: RunUiSnapshot, opts?: { staleMs?: number; deadMs?: number; now?: number }): HeartbeatSummary { /* ... */ }
```
**Tests:** `test/unit/heartbeat-aggregator.test.ts` — 6 cases (all healthy, mixed, all dead, missing record, custom threshold, edge `lastSeenAt=now`).
#### 8.2.B Health pane (0.75 dev-day)
**File mới:** `src/ui/dashboard-panes/health-pane.ts`
```ts
export function renderHealthPane(snapshot: RunUiSnapshot | undefined, opts?: { staleMs?: number; deadMs?: number; isForeground?: boolean }): string[] {
if (!snapshot) return ["Health pane: snapshot unavailable"];
const summary = summarizeHeartbeats(snapshot, opts);
const lines: string[] = [
`Health: ${summary.healthy}/${summary.totalTasks} healthy · stale=${summary.stale} · dead=${summary.dead} · missing=${summary.missing}`,
];
if (summary.worstStaleMs > 0) lines.push(`Worst stale: ${Math.round(summary.worstStaleMs / 1000)}s ago`);
// Q4=c: show full action menu hint
const actionHints: string[] = [];
if ((summary.dead > 0 || summary.missing > 0) && opts?.isForeground !== false) actionHints.push("R recovery");
if (summary.dead > 0 || summary.stale > 0) actionHints.push("K kill stale");
actionHints.push("D diagnostic export");
if (actionHints.length > 0) lines.push(`Actions: ${actionHints.join(" · ")}`);
if (summary.dead > 0 && opts?.isForeground === false) lines.push("(Async run: R/K disabled — use kill <pid> manually)");
return lines;
}
```
**Update `run-dashboard.ts`:**
- Thêm `"health"` vào type `Pane`.
- Key `5` → `activePane = "health"`.
- Switch case render `renderHealthPane` với `isForeground` từ `selectedRun.async ? false : true`.
- Trong `handleInput`: nếu `activePane === "health"`:
- `R` → emit `{action: "health-recovery", runId}` (handler sẽ check foreground + ConfirmOverlay).
- `K` → emit `{action: "health-kill-stale", runId}` (handler ConfirmOverlay if dead > 5).
- `D` → emit `{action: "health-diagnostic-export", runId}` (handler check existing diag < 1min → confirm overwrite).
- Header help line update: `"1 agents 2 progress 3 mailbox 4 output 5 health • s/u/a/i actions • R/K/D health"`.
**Tests:** `test/unit/health-pane.test.ts` — 6 cases (no snapshot, all healthy → only D hint, dead foreground → R+K+D, dead async → only D + warning, mixed states, foreground false hint visible).
#### 8.2.C Auto-recovery toast (0.5 dev-day) — Q4 simplified
**Logic:** `RenderScheduler.tick` callback (đã có) gọi `summarizeHeartbeats`; nếu `dead > 0` hoặc `missing > 0` lần đầu → fire toast qua `notification-router` (8.3.A) với severity `"warning"`:
- Title: `"Run {runId} has {N} dead workers"`.
- Body: `"Open dashboard → 5 health → R recovery / K kill stale / D diagnostic"`.
**Throttle (D13):** dedup id = `recovery_dead_workers_${runId}` — router dedup 5 phút/run/condition-type. Riêng `recovery_missing_heartbeat` có id khác để alert song song nếu cả hai cùng xảy ra.
**Tests:** `test/integration/health-recovery.test.ts` — simulate stale heartbeat, verify single toast emitted; emit lần 2 trong window → drop; emit lần 2 sau 5min → fire lại.
#### 8.2.D Health action handlers (1.5 dev-day) — Q4=c full menu
**Update `src/extension/registration/commands.ts`:** handle 3 new actions từ dashboard:
```ts
// pseudo-code
if (selection.action === "health-recovery") {
const run = manifestCache.get(selection.runId);
if (run?.async) { ctx.ui.notify("Recovery only available for foreground runs.", "warning"); return; }
const confirmed = await openConfirmOverlay(ctx, { title: "Interrupt foreground run?", body: "Tasks will be marked failed. Y=interrupt, N=cancel.", dangerLevel: "high" });
if (!confirmed) return;
const r = await dispatchHealthRecovery(ctx, selection.runId);
ctx.ui.notify(r.message, r.ok ? "info" : "error");
}
if (selection.action === "health-kill-stale") {
const summary = summarizeHeartbeats(snapshotCache.get(selection.runId)!);
if (summary.dead + summary.stale > 5) {
const confirmed = await openConfirmOverlay(ctx, { title: `Kill ${summary.dead + summary.stale} stale workers?`, dangerLevel: "medium" });
if (!confirmed) return;
}
const r = await dispatchKillStaleWorkers(ctx, selection.runId);
ctx.ui.notify(r.message, r.ok ? "info" : "error");
}
if (selection.action === "health-diagnostic-export") {
// D14: check existing diag in last 1min
const diagDir = path.join(run.artifactsRoot, "diagnostic");
const recentDiag = listRecentDiagnostic(diagDir, 60_000);
if (recentDiag) {
const confirmed = await openConfirmOverlay(ctx, { title: "Recent diagnostic exists", body: `File ${recentDiag} created < 1min ago. Overwrite?`, defaultAction: "cancel" });
if (!confirmed) return;
}
const r = await dispatchDiagnosticExport(ctx, selection.runId);
ctx.ui.notify(`Diagnostic exported to ${r.data}`, r.ok ? "info" : "error");
}
```
**File mới:** `src/runtime/diagnostic-export.ts` — collect manifest + tasks + recent events (limit 200) + heartbeat summary + agent status snapshot; redact secrets từ env/config (block list: `*token*`, `*key*`, `*password*`, `*secret*`); write JSON vào `<crewRoot>/artifacts/{runId}/diagnostic-{ISO-timestamp}.json`.
**Skeleton:**
```ts
// diagnostic-export.ts
export interface DiagnosticReport {
runId: string;
exportedAt: string;
manifest: TeamRunManifest;
tasks: TeamTaskState[];
recentEvents: TeamEvent[];
heartbeat: HeartbeatSummary;
agents: { taskId: string; status: AgentStatus }[];
envRedacted: Record<string, string>; // env vars with secrets masked as "***"
}
export async function exportDiagnostic(ctx: ExtensionContext, runId: string): Promise<{ path: string; report: DiagnosticReport }> { /* ... */ }
function redactSecrets(obj: unknown): unknown { /* recursive replace values where key matches block list */ }
```
**Tests:**
- `test/unit/diagnostic-export.test.ts` — 5 cases (basic export, secret redaction, missing run errors, file path generation, JSON validity).
- Smoke: export → open file → verify đầy đủ field + 0 secrets.
---
### Phase 8.3 — Smart Notifications
#### 8.3.A Notification router (1 dev-day)
**File mới:** `src/extension/notification-router.ts`
```ts
export type Severity = "info" | "warning" | "error" | "critical";
export interface NotificationDescriptor {
id?: string; // dedup key; nếu cùng id trong window → drop
severity: Severity;
source: string; // "run-completed" | "subagent-stuck" | "health" | ...
runId?: string;
title: string;
body?: string;
timestamp?: number;
}
export interface NotificationRouterOptions {
dedupWindowMs?: number; // default 30000
batchWindowMs?: number; // default 0 (no batching by default)
quietHours?: string; // "22:00-07:00" local
severityFilter?: Severity[]; // default: ["warning", "error", "critical"]
sink?: (n: NotificationDescriptor) => void; // optional file/stream sink
}
export class NotificationRouter {
constructor(private opts: NotificationRouterOptions = {}, private deliver: (n: NotificationDescriptor) => void) {}
enqueue(n: NotificationDescriptor): void { /* dedup check, severity filter, quiet-hours skip, batch buffer, sink */ }
flush(): void { /* deliver batched */ }
dispose(): void { /* clear timers */ }
}
```
**Wrap `sendFollowUp`:** trong `register.ts`, thay 2 call sites `sendFollowUp(...)` thành `notificationRouter.enqueue({...})`. Router decides có deliver qua `sendFollowUp` hay không.
**Tests:** `test/unit/notification-router.test.ts` — 8 cases (dedup, severity filter, quiet hours mock clock, batch, sink invocation, dispose cleanup).
#### 8.3.B Quiet-hours + batching config (0.75 dev-day) — Q3=b cross-day wrap
**Update `src/schema/config-schema.ts`:**
```ts
notifications: Type.Optional(Type.Object({
enabled: Type.Optional(Type.Boolean()),
severityFilter: Type.Optional(Type.Array(Type.Union([Type.Literal("info"), Type.Literal("warning"), Type.Literal("error"), Type.Literal("critical")]))),
dedupWindowMs: Type.Optional(Type.Integer({ minimum: 1000 })),
batchWindowMs: Type.Optional(Type.Integer({ minimum: 0 })),
quietHours: Type.Optional(Type.String({ pattern: "^\\d{2}:\\d{2}-\\d{2}:\\d{2}$" })),
sinkRetentionDays: Type.Optional(Type.Integer({ minimum: 1, maximum: 90 })), // Q2=c, default 7
})),
```
**Update `src/config/defaults.ts`:** sane defaults (`severityFilter: ["warning","error","critical"]`, `dedupWindowMs: 30_000`, `batchWindowMs: 0`, `sinkRetentionDays: 7`).
**Update `src/config/config.ts`:** parse + merge giống các section khác.
**Cross-day parser (Q3=b):** trong `notification-router.ts`, helper isolated cho easy testing:
```ts
// notification-router.ts (excerpt)
export function parseHHMMRange(range: string): { startMin: number; endMin: number } {
const [s, e] = range.split("-").map((part) => {
const [hh, mm] = part.split(":").map(Number);
return hh * 60 + mm;
});
return { startMin: s, endMin: e };
}
export function isInQuietHours(range: string, now: Date = new Date()): boolean {
const { startMin, endMin } = parseHHMMRange(range);
const cur = now.getHours() * 60 + now.getMinutes();
if (startMin === endMin) return false; // empty range
// Q3=b: cross-day wrap when start > end
return startMin <= endMin
? (cur >= startMin && cur < endMin)
: (cur >= startMin || cur < endMin);
}
```
**Tests:** `test/unit/notification-router.test.ts` thêm 4 cases parser:
- `"09:00-17:00"` ở 12:00 → quiet (true).
- `"09:00-17:00"` ở 22:00 → not quiet (false).
- `"22:00-07:00"` ở 23:30 → quiet (cross-day true).
- `"22:00-07:00"` ở 03:00 → quiet (cross-day true).
- `"22:00-07:00"` ở 12:00 → not quiet (false).
- Edge: `"00:00-23:59"` ở 12:00 → quiet (always-quiet within day).
- Edge: `"00:00-00:00"` → always not quiet (empty range).
#### 8.3.C Toast badge integration (0.75 dev-day)
**Logic:** `NotificationRouter.deliver` → ngoài `sendFollowUp`, cộng `unreadCount++` trong `widgetState.notificationCount`. Reset khi user mở mailbox detail hoặc nhấn `H` (Hush — dismiss-all notifications visible badge).
**Update `crew-widget.ts`:** model render thêm `🔔${count}` nếu `count > 0`. Để tránh emoji compatibility issue → fallback `[!${count}]` khi terminal không support emoji (detect qua `process.env.TERM`).
**Update `powerbar-publisher.ts`:** segment `pi-crew-active` text append ` 🔔${count}` (hoặc fallback) khi active.
**Tests:** `test/unit/widget-notification-badge.test.ts` — 5 cases (no count, count=1, count>9, dismiss reset, terminal fallback).
#### 8.3.D Notification JSONL sink (0.5 dev-day) — Q2=c
**File mới:** `src/extension/notification-sink.ts`
**Logic:** khi config `telemetry.enabled !== false`, NotificationRouter delivery cũng gọi `sink.write(descriptor)`. Sink writes vào `<crewRoot>/state/notifications/{YYYY-MM-DD}.jsonl` (1 file/day, append-only).
**Rotation:** start-of-day check (lazy, khi write đầu tiên) → delete files cũ hơn `notifications.sinkRetentionDays` (default 7).
**Skeleton:**
```ts
// notification-sink.ts
export interface NotificationSink {
write(n: NotificationDescriptor): void;
dispose(): void;
}
export function createJsonlSink(crewRoot: string, retentionDays: number): NotificationSink {
const dir = path.join(crewRoot, "state", "notifications");
let lastRotateDate = "";
return {
write(n) {
const today = new Date().toISOString().slice(0, 10);
if (today !== lastRotateDate) {
rotateOldFiles(dir, retentionDays);
lastRotateDate = today;
}
fs.mkdirSync(dir, { recursive: true });
fs.appendFileSync(path.join(dir, `${today}.jsonl`), JSON.stringify({ ...n, timestamp: n.timestamp ?? Date.now() }) + "\n");
},
dispose() { /* no-op */ },
};
}
function rotateOldFiles(dir: string, retentionDays: number): void {
if (!fs.existsSync(dir)) return;
const cutoff = Date.now() - retentionDays * 24 * 60 * 60 * 1000;
for (const file of fs.readdirSync(dir)) {
if (!file.endsWith(".jsonl")) continue;
const stat = fs.statSync(path.join(dir, file));
if (stat.mtimeMs < cutoff) fs.unlinkSync(path.join(dir, file));
}
}
```
**Wire trong `register.ts`:** instantiate sink khi `telemetry.enabled !== false`, pass vào `NotificationRouter` options. Dispose trong `cleanupRuntime`.
**Tests:** `test/unit/notification-sink.test.ts` — 5 cases (write basic, daily rotation, retention prune, no rotation cùng ngày, telemetry disabled = no-op).
---
### Phase 8.4 — Wiring (0.75 dev-day)
**Update `src/extension/register.ts`:**
- Instantiate `NotificationRouter` cùng cấp với `runSnapshotCache`; check `loadConfig.telemetry?.enabled !== false` để decide có pass `JsonlSink` không.
- Pass router vào `subagentManager` callback (line 64-86) thay vì gọi trực tiếp `sendFollowUp`.
- Pass router vào `RenderScheduler` callback cho 8.2.C auto-recovery alert.
- Pass `getRunSnapshotCache` + `notificationRouter` vào `commands.ts` deps.
- Dispose router + sink trong `cleanupRuntime`.
**Update `src/extension/registration/commands.ts`:**
- Handle `selection.action === "mailbox-detail"` → mở `MailboxDetailOverlay`, dispatch action result, toast.
- Handle `selection.action === "health-recovery" | "health-kill-stale" | "health-diagnostic-export"` (Q4=c) — flow chi tiết 8.2.D.
- Pass `getRunSnapshotCache` cho overlay (cần để re-render sau action).
- Pass `confirmOverlayFactory` để các handler reuse `ConfirmOverlay`.
---
### Phase 8.5 — Tests + Validation (2 dev-day)
**Unit (mới ~52 cases):**
- `run-action-dispatcher.test.ts` (7)
- `confirm-overlay.test.ts` (4)
- `mailbox-detail-overlay.test.ts` (7 — bao gồm 3 cases ackAll Shift+X)
- `mailbox-compose-overlay.test.ts` (8)
- `mailbox-compose-preview.test.ts` (6) — Q1
- `agent-picker-overlay.test.ts` (4)
- `heartbeat-aggregator.test.ts` (6)
- `health-pane.test.ts` (6) — Q4 expanded
- `diagnostic-export.test.ts` (5) — Q4
- `notification-router.test.ts` (8 + 4 quiet-hours parser cases = 12) — Q3
- `notification-sink.test.ts` (5) — Q2
- `widget-notification-badge.test.ts` (5)
**Integration (mới ~6 cases):**
- `test/integration/mailbox-action-roundtrip.test.ts` — open dashboard → ack → snapshot invalidate → count giảm.
- `test/integration/mailbox-ackall-confirm.test.ts` — ackAll trigger ConfirmOverlay → confirm → loop ack 10 messages.
- `test/integration/notification-dedup.test.ts` — emit cùng event 5 lần trong 30s → 1 toast.
- `test/integration/notification-quiet-hours.test.ts` — set quietHours `"22:00-07:00"`, mock now=23:30 → 0 toast; mock now=12:00 → 1 toast.
- `test/integration/notification-sink-rotation.test.ts` — write 8 days → oldest file deleted on day 8.
- `test/integration/health-recovery-foreground.test.ts` — foreground run dead → R action → ConfirmOverlay → confirm → foreground-interrupt fired.
- `test/integration/health-diagnostic-export.test.ts` — D action → diagnostic file written với secrets redacted; emit lần 2 trong 1min → ConfirmOverlay overwrite.
**Acceptance trước commit:**
- `npm test` ≥ 351 unit (current 299 + 52), 35 integration (current 29 + 6); 0 fail. Verified current suite: 351 unit + 44 integration.
- `npm run typecheck` clean.
- Manual smoke coverage (8 scenarios — mục 6) captured as automated smoke in `test/integration/phase8-smoke.test.ts`.
## 3. Wave Organization (parallel-friendly) — Updated với Q1-Q6
```
Wave 1 (parallel, 2.5 days)
├─ 8.0 Foundation (dispatcher + keybinding-map + ConfirmOverlay)
├─ 8.3.A NotificationRouter primitive
└─ 8.2.A Heartbeat aggregator
Wave 2 (sequential, 5 days) — depends on Wave 1
├─ 8.1.A Mailbox detail overlay
├─ 8.1.B Ack action
├─ 8.1.C Nudge action
├─ 8.1.D Compose form (Q6 ESC discard + confirm-if-long)
├─ 8.1.E Compose preview pane (Q1)
└─ 8.1.F ackAll Shift+X destructive (Q5)
Wave 3 (parallel, 4 days) — depends on Wave 1
├─ 8.2.B Health pane
├─ 8.2.C Auto-recovery toast (throttled 5min D13)
├─ 8.2.D Health action handlers R/K/D (Q4) + diagnostic-export module
├─ 8.3.B Quiet-hours cross-day parser (Q3) + batching config
├─ 8.3.C Toast badge widget/powerbar
└─ 8.3.D JSONL sink + retention (Q2)
Wave 4 (sequential, 2.75 days)
├─ 8.4 Wire register.ts + commands.ts (router, sink, action handlers)
└─ 8.5 Tests + smoke validation (52 unit + 6 integration mới)
```
**Total estimate: 14-18 dev-days** (vs Phase 7 baseline 18 days). Effort tăng 3.35 day so với plan gốc 11-14d do Q1-Q6 chosen options enrich scope. Phase 8 vẫn smaller hơn Phase 7 vì chủ yếu UI overlay + event router, không động state machine.
## 4. Files Affected — Updated với Q1-Q6
### New (24 files)
| Path | Purpose | Est LOC |
|---|---|---|
| `src/ui/run-action-dispatcher.ts` | Wrapper team-tool calls (7 dispatchers) | ~140 |
| `src/ui/keybinding-map.ts` | Key registry (mailbox/health/notification scopes) | ~70 |
| `src/ui/overlays/confirm-overlay.ts` | **(Q5)** Reusable confirm primitive | ~80 |
| `src/ui/overlays/mailbox-detail-overlay.ts` | 2-col mailbox view + ackAll | ~250 |
| `src/ui/overlays/mailbox-compose-overlay.ts` | Compose form + ESC guard | ~210 |
| `src/ui/overlays/mailbox-compose-preview.ts` | **(Q1)** Markdown preview pane | ~120 |
| `src/ui/overlays/agent-picker-overlay.ts` | Agent selector | ~110 |
| `src/ui/heartbeat-aggregator.ts` | Heartbeat summary fn | ~70 |
| `src/ui/dashboard-panes/health-pane.ts` | Health pane renderer with action hints | ~80 |
| `src/extension/notification-router.ts` | Router + dedup + quiet-hours parser **(Q3)** | ~220 |
| `src/extension/notification-sink.ts` | **(Q2)** JSONL sink + retention rotation | ~100 |
| `src/runtime/diagnostic-export.ts` | **(Q4)** Diagnostic JSON exporter + secret redaction | ~140 |
| `test/unit/run-action-dispatcher.test.ts` | | ~140 |
| `test/unit/confirm-overlay.test.ts` | | ~80 |
| `test/unit/mailbox-detail-overlay.test.ts` | | ~180 |
| `test/unit/mailbox-compose-overlay.test.ts` | | ~180 |
| `test/unit/mailbox-compose-preview.test.ts` | **(Q1)** | ~120 |
| `test/unit/agent-picker-overlay.test.ts` | | ~80 |
| `test/unit/heartbeat-aggregator.test.ts` | | ~120 |
| `test/unit/health-pane.test.ts` | Q4 expanded scenarios | ~140 |
| `test/unit/diagnostic-export.test.ts` | **(Q4)** | ~110 |
| `test/unit/notification-router.test.ts` | + 4 cross-day parser cases | ~260 |
| `test/unit/notification-sink.test.ts` | **(Q2)** | ~100 |
| `test/unit/widget-notification-badge.test.ts` | | ~80 |
| `test/integration/mailbox-action-roundtrip.test.ts` | | ~120 |
| `test/integration/mailbox-ackall-confirm.test.ts` | **(Q5)** | ~100 |
| `test/integration/notification-dedup.test.ts` | | ~90 |
| `test/integration/notification-quiet-hours.test.ts` | **(Q3)** mock clock | ~110 |
| `test/integration/notification-sink-rotation.test.ts` | **(Q2)** | ~110 |
| `test/integration/health-recovery-foreground.test.ts` | **(Q4)** | ~120 |
| `test/integration/health-diagnostic-export.test.ts` | **(Q4)** | ~120 |
### Modified (10 files)
| Path | Change |
|---|---|
| `src/ui/run-dashboard.ts` | Refactor `handleInput` dùng keybinding-map; thêm pane "health" key `5`; help line; emit `health-recovery/health-kill-stale/health-diagnostic-export` actions (Q4) |
| `src/ui/dashboard-panes/mailbox-pane.ts` | Update help text gợi ý A/N/C/Enter/Shift+X (ackAll) |
| `src/ui/crew-widget.ts` | Render notification badge `🔔N` (fallback `[!N]` cho terminal không support emoji) |
| `src/ui/powerbar-publisher.ts` | Append badge cho `pi-crew-active` segment |
| `src/extension/register.ts` | Instantiate NotificationRouter + JsonlSink (gated bởi telemetry); wrap `sendFollowUp`; pass vào RenderScheduler + commands deps |
| `src/extension/registration/commands.ts` | Handle `mailbox-detail` + 3 health actions (Q4); mở overlay; reuse ConfirmOverlay (Q5) |
| `src/extension/team-tool/api.ts` | (no change) — dispatchers reuse existing operations |
| `src/schema/config-schema.ts` | Thêm `notifications` section + `sinkRetentionDays` (Q2) |
| `src/config/{config.ts,defaults.ts}` | Parse + default cho notifications (severityFilter, dedupWindowMs, batchWindowMs, quietHours, sinkRetentionDays) |
| `package.json` | Bump version `0.1.33` → `0.1.34` |
### Docs (chỉ update khi user yêu cầu, theo project rule)
- `docs/architecture.md` — bổ sung mục "Operator Actions", "Notification Router", "Diagnostic Export".
## 5. Risk Assessment — Updated với Q1-Q6
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Overlay hijack stdin của Pi UI | Med | High | Reuse pattern `LiveRunSidebar` (đã hoạt động); test với `pi-ui-compat.ts` shim |
| Keybinding conflict với Pi global hotkeys | Low | Med | Uppercase `A/N/C/P/H/X` (mailbox), `R/K/D` (health) — context-scoped; lowercase Pi defaults không đụng |
| Notification spam khi nhiều run concurrent | Med | Low-Med | Dedup window 30s default; severity filter excludes "info"; quiet-hours wrap (Q3) |
| Quiet-hours cross-day parser bug | Low | Med | Q3=b: 7 unit test cases bao gồm cross-midnight; mock clock pattern |
| MailboxDetailOverlay re-render slow | Low | Low | Reuse signature pattern từ `RunDashboard`; cache lines |
| Race khi ack trong khi snapshot đang refresh | Low | Med | Dispatch awaits then invalidate cache; render scheduler debounce 75ms |
| `sendFollowUp` swap break existing flow | Low | High | Wrap không thay; router default-on chỉ khi `notifications.enabled !== false`; fallback gọi `sendFollowUp` raw nếu router throws |
| Config schema breaking change | Low | High | New section `notifications` purely optional; missing → defaults |
| **(Q1)** Compose preview pane re-render bottleneck (debounce miss) | Low | Low | Debounce 100ms; cache last rendered tokens; tokenizer < 1ms cho 5KB body |
| **(Q1)** Markdown tokenizer edge case (nested code in list) | Med | Low | Reuse pattern parser nếu có; 6 unit test edge cases; preview "best-effort" |
| **(Q2)** Sink disk full / write fail | Low | Low | `appendFileSync` swallow errors qua `logInternalError`; sink failure không crash router |
| **(Q2)** Retention prune deletes file đang được tail | Low | Low | Chỉ prune `.jsonl` cũ hơn cutoff; daily rotation đảm bảo file hôm nay không bị touch |
| **(Q2)** PII trong notification body leak vào sink | Med | Med | Sink reuse secret redactor từ `diagnostic-export.ts` (Q4); router tag PII fields nếu cần |
| **(Q4)** `R` recovery accidentally interrupt healthy run | Low | High | ConfirmOverlay với `dangerLevel: "high"` + default cancel; foreground-only check; clear "tasks marked failed" warning |
| **(Q4)** `K` kill stale workers race với worker self-recovery | Low | Med | Mark dead heartbeats first → emit event → giải phóng claims; worker tự detect token mismatch sẽ exit |
| **(Q4)** Diagnostic export ghi đè artifact dir đang dùng | Low | Med | D14: check existing diag < 1min → ConfirmOverlay overwrite; timestamp suffix unique |
| **(Q4)** Diagnostic secret redaction miss key pattern mới | Med | High | Block list: `*token*`, `*key*`, `*password*`, `*secret*`, `*credential*`, `*auth*`; review qua test fixture với 20 key patterns |
| **(Q5)** ConfirmOverlay default `Y` accidentally confirms destructive | Low | High | Default action = "cancel"; first focus là `[N]`; ESC = cancel; UI hint underlined N |
| **(Q6)** ESC discard confirm fatigue (user complain phải confirm mỗi ESC) | Low | Low | Threshold 50 ký tự (configurable nếu user feedback); short body → discard ngay |
| **(Q6)** Body multi-line Ctrl+Enter not detected on Windows | Med | Low | Test với `pi-ui-compat.ts`; fallback `Alt+Enter` if Ctrl+Enter fails detection |
## 6. Testing Strategy — Updated với Q1-Q6
**Unit-level (Wave 1-3):**
- Mock `handleTeamTool` → assert dispatcher returns đúng `{ok, message}` cho 7 dispatchers.
- Render overlay với fixture snapshot → assert lines layout.
- Heartbeat aggregator: parameterized test với fixture timestamps (6 cases).
- Health pane: 6 cases bao phủ foreground/async/healthy/dead/stale variations (Q4).
- Notification router: mock clock (`globalThis.Date.now` override theo pattern Phase 7); 8 base cases + 4 cross-day parser (Q3).
- Sink: rotation, retention, telemetry-disabled no-op (Q2).
- Diagnostic export: secret redaction với 20-key fixture; JSON schema validate (Q4).
- Confirm overlay: 4 cases verify default-cancel safety (Q5).
- Compose preview: 6 cases markdown render (Q1).
**Integration (Wave 4) — 7 scenarios:**
- `mailbox-action-roundtrip.test.ts`: open dashboard → ack → snapshot invalidate → count giảm.
- `mailbox-ackall-confirm.test.ts` (Q5): Shift+X → ConfirmOverlay → confirm → loop ack 10 messages → all `acknowledged`.
- `notification-dedup.test.ts`: emit 5x cùng `crew.run.failed` trong 30s → `sendFollowUp` mock called once.
- `notification-quiet-hours.test.ts` (Q3): quiet `"22:00-07:00"` mock now=23:30 → 0 toast; mock now=12:00 → 1 toast.
- `notification-sink-rotation.test.ts` (Q2): write 8 ngày fake mtime → oldest deleted on day 8.
- `health-recovery-foreground.test.ts` (Q4): foreground run với 2 dead workers → R action → ConfirmOverlay confirm → `foreground-interrupt` API called → tasks marked failed.
- `health-diagnostic-export.test.ts` (Q4): D action → file written với 0 secrets in JSON; emit lần 2 trong 1min → ConfirmOverlay overwrite.
**Smoke manual (8 scenarios):**
1. Chạy `team run` 1 task foreground → mở `/team-dashboard` → key `3` mailbox → Enter → key `N` nudge → verify `events.jsonl` có `agent.nudged`.
2. Chạy 2 run, đợi xong → verify nhận 1-2 toast (dedup).
3. Set `notifications.quietHours = "00:00-23:59"` → verify 0 toast.
4. **(Q1)** Compose form, gõ markdown body với bold/list/code → key `P` preview → verify render đúng.
5. **(Q5)** ackAll trên run với 5 unread → ConfirmOverlay xuất hiện → N cancel → 0 message acked.
6. **(Q4)** Foreground run với worker stuck > 1min → key `5` health → key `R` → ConfirmOverlay → Y → tasks failed; key `D` → diagnostic file viết.
7. **(Q2)** Disable telemetry → run + emit notification → verify `<crewRoot>/state/notifications/` không tồn tại.
8. **(Q6)** Compose body 100 chars → ESC → ConfirmOverlay xuất hiện → N → vẫn editing.
**Performance budget:**
- Mailbox overlay first render < 50ms với 100 messages.
- Compose preview render < 30ms với 5KB markdown body (Q1).
- Notification router enqueue overhead < 1ms.
- Sink write < 5ms (single append) (Q2).
- Health pane render < 5ms cho 50 tasks.
- Diagnostic export complete < 200ms cho run với 50 tasks + 200 events (Q4).
## 7. Open Questions — RESOLVED (Path X chosen)
| Q | Câu hỏi | Lựa chọn | Implementation reference |
|---|---|---|---|
| **Q1** | Compose form có cần preview render trước khi submit? | **(b) Có preview pane** | 8.1.E `mailbox-compose-preview.ts`, key `P` toggle, render markdown read-only (bold/italic/code/list/heading), debounce 100ms. D15. +0.75d |
| **Q2** | Notification sink default ghi `<crewRoot>/state/notifications.jsonl`? | **(c) Sink khi `telemetry.enabled !== false`** | 8.3.D `notification-sink.ts`, JSONL `<crewRoot>/state/notifications/{YYYY-MM-DD}.jsonl`, rotate `sinkRetentionDays` default 7. D4. +0.5d |
| **Q3** | Quiet-hours cross-day wrap? | **(b) Wrap parser** | 8.3.B `parseHHMMRange` + `isInQuietHours` cross-day logic; 7 unit cases bao gồm `"22:00-07:00"`. D5. +0.25d |
| **Q4** | Health pane recovery action button inline? | **(c) Full action menu R/K/D** | 8.2.D `R` recovery (foreground-only), `K` kill stale workers, `D` diagnostic export với secret redaction; 3 confirm flows. D9, D14. +1.5d |
| **Q5** | Ack/nudge confirm cho destructive? | **(c) Confirm chỉ destructive (ackAll/recovery/diag-overwrite)** | 8.0 `ConfirmOverlay` reusable primitive; 8.1.F ackAll Shift+X with confirm. D12. +0.25d |
| **Q6** | Compose form persist draft khi ESC? | **(a) ESC discard + confirm-if-long** | 8.1.D ESC behavior: body ≤ 50 chars → discard; > 50 → ConfirmOverlay. Defer draft persistence Phase 9. D11. +0.1d |
**Tổng effort delta từ Q1-Q6: ~3.35 dev-day** → bump từ 11-14d → 14-18d.
**Mục tiêu Q1-Q6 đã đạt:** mọi quyết định scope-shaping đã chốt; team có thể start Wave 1 mà không bị blocked clarification giữa chừng.
## 8. Dependencies & Sequencing
```
Phase 7 (DONE) ─────► Phase 8.0 Foundation
┌──────────┼──────────┐
▼ ▼ ▼
8.1 Mailbox 8.2 Health 8.3 Notif
│ │ │
└──────────┼──────────┘
8.4 Wiring
8.5 Tests
```
**Hard prerequisites Phase 7:** ✅ `RunSnapshotCache`, `RenderScheduler`, dashboard panes — đã có.
## 9. Effort Summary — Updated với Q1-Q6
| Wave | Items | Dev-days | Parallelizable |
|---|---|---|---|
| 1 | 8.0 (Foundation + ConfirmOverlay Q5) + 8.3.A (Router) + 8.2.A (Heartbeat) | 2.5 | Yes (3 streams) |
| 2 | 8.1.A → B → C → D (Q6) → E (Q1) → F (Q5) | 5 | No (sequential UX, share overlay state) |
| 3 | 8.2.B + 8.2.C + 8.2.D (Q4 R/K/D) + 8.3.B (Q3) + 8.3.C + 8.3.D (Q2) | 4 | Yes (5 streams) |
| 4 | 8.4 (Wire) + 8.5 (Tests) | 2.75 | No |
| **Total** | **17 sub-phases** | **14-18** | — |
**So với plan gốc:** +3.35 dev-day, +5 sub-phases, +8 file mới, +27 unit case, +4 integration case.
## 10. Acceptance Checklist (Wave 4 exit criteria) — Updated
- [x] Tất cả checkbox 8.0 → 8.5 ở mục 0 (Implementation Status) tick `[x]`.
- [x] `npm test` ≥ **351 unit** (current 299 + 52 mới), ≥ **35 integration** (current 29 + 6 mới), 0 fail. Verified: 351 unit + 44 integration pass.
- [x] `npm run typecheck` clean.
- [x] Manual smoke **8 scenarios** pass (mục 6). Verified via automated smoke suite `test/integration/phase8-smoke.test.ts`.
- [x] Performance budget thỏa: mailbox overlay <50ms, compose preview <30ms, sink write <5ms, diagnostic export <200ms. Verified microbench: mailbox 6.39ms, preview 1.61ms, health 0.29ms, sink 2.12ms, diagnostic 4.83ms.
- [x] No regression: 299 unit + 29 integration cũ vẫn pass.
- [x] Config breaking? **No.** Schema additive (`notifications` section optional).
- [x] Bump `package.json` version `0.1.33` → `0.1.34`.
- [x] Q1-Q6 implementations match decisions table mục 7.
- [x] Secret redaction (Q4): test fixture with recursive key/value redaction pass; audit log avoids known token fixture.
## 11. Out of Scope (defer Phase 9+)
> Phase 9 plan đã được tạo riêng tại [`research-phase9-observability-reliability-plan.md`](./research-phase9-observability-reliability-plan.md).
- **Telemetry/Metrics backbone** (Counter/Gauge/Histogram + correlation ID + OTLP/Prometheus export) → **Phase 9 (Theme B)** per Path X plan.
- **Run reliability** — auto-retry executor + crash recovery + deadletter + heartbeat watcher → **Phase 9 (Theme C)**.
- Cross-run mailbox routing (operator-broadcast) — **Phase 10+**.
- Mailbox threading / reply chains — **Phase 10+**.
- **Compose draft persistence (Q6 b/c options)** — defer Phase 9 nếu user feedback than.
- Multi-host run aggregation — **Phase 10+**.
- Slack/Discord webhook sink (router supports it via custom sink, but no built-in adapter) — **Phase 10+**.
- Markdown preview với images/links rendered (Q1 D15 skip) — **Phase 10+**.
### Path X roadmap summary
| Phase | Theme | Effort | Plan file |
|---|---|---|---|
| 6 | `.crew/` migration + autonomous policy | ~12d | `refactor-tasks-phase6.md` (DONE) |
| 7 | UI Optimization | ~18d | `research-ui-optimization-plan.md` (DONE) |
| **8** | **Operator Experience (Theme A)** | **14-18d** | **THIS FILE — ✅ DONE (verified 351 unit + 44 integration pass, version 0.1.34)** |
| **9** | **Observability + Reliability (B+C)** | **19.5-22.5d** | `research-phase9-observability-reliability-plan.md` (post-review updated 2026-04-29) |
| 10+ | TBD: Perf baseline, distributed | — | Future |
---
## 12. Implementation Kickoff Checklist (Pre-Wave 1)
Trước khi bắt đầu Wave 1, verify:
- [x] Phase 7 đã commit (snapshot cache + render scheduler + 4 panes). Included in `phase-8-operator-experience` release commit.
- [x] `npm test` baseline pass (299 unit + 29 integration). Verified current suite: 351 unit + 44 integration pass.
- [x] `npm run typecheck` clean.
- [x] Q1-Q6 đã chốt (đã làm — table mục 7).
- [x] Branch mới `phase-8-operator-experience` từ main.
- [x] Read once: `src/extension/team-tool/api.ts` (đã có ack-message/send-message/nudge-agent operations — KHÔNG cần modify).
- [x] Read once: `src/ui/run-dashboard.ts:handleInput` để hiểu pattern key dispatch hiện tại.
- [x] Read once: `src/ui/live-run-sidebar.ts` để có template cho overlay implementation.
**Sẵn sàng triển khai Phase 8 Path X.**

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,357 @@
# Research: pi-mono coding-agent Deep Read
> Ngày: 2026-04-29 | Read-only research | Source: `source/pi-mono/packages/coding-agent/`
## 1. Vai trò trong monorepo
`@mariozechner/pi-coding-agent` là package trung tâm nhất của pi-mono. Nó chứa CLI binary `pi`,
toàn bộ agent session lifecycle, extension host system, 3 run modes, 7 built-in tools, session
persistence, compaction, branch summarization, và SDK cho programmatic usage.
Package version: `0.70.5` (lockstep với toàn bộ monorepo).
## 2. Cấu trúc source
```
src/
├── cli.ts # Binary entry point (shebang #!/usr/bin/env node)
├── main.ts # CLI logic: parse args, dispatch mode (731 dòng)
├── index.ts # Public API exports (~250 dòng re-exports)
├── config.ts # Path constants (agentDir, VERSION, APP_NAME)
├── cli/ # CLI subsystems
│ ├── args.ts # Argument parsing (yargs-style)
│ ├── file-processor.ts # @file argument expansion
│ ├── initial-message.ts # Build initial prompt from args/stdin
│ ├── list-models.ts # --list-models output
│ └── session-picker.ts # Interactive session selection
├── core/ # ═══ CORE LAYER ═══
│ ├── agent-session.ts # AgentSession class (3099 dòng) — TRUNG TÂM
│ ├── agent-session-runtime.ts # AgentSessionRuntime wrapper (session replacement)
│ ├── agent-session-services.ts # Dịch vụ tạo cwd-bound runtime
│ ├── sdk.ts # createAgentSession() public factory (~408 dòng)
│ ├── session-manager.ts # Session file I/O, entries, tree (1425 dòng)
│ ├── settings-manager.ts # settings.json manager (~1069 dòng)
│ ├── system-prompt.ts # System prompt builder (172 dòng)
│ ├── resource-loader.ts # Load extensions/skills/prompts/themes (~920 dòng)
│ ├── model-registry.ts # Model + auth registry
│ ├── model-resolver.ts # Model resolution / scope / fallback
│ ├── keybindings.ts # Keybinding manager (KeybindingsManager)
│ ├── messages.ts # AgentMessage type definitions + converters
│ ├── bash-executor.ts # Bash execution abstraction layer
│ ├── prompt-templates.ts # File-based prompt templates (@file expansion)
│ ├── skills.ts # Skill loading + formatting for system prompt
│ ├── slash-commands.ts # 21 built-in slash commands
│ ├── event-bus.ts # Shared event bus for cross-extension communication
│ ├── footer-data-provider.ts # Footer data provider (git branch + extension statuses)
│ ├── auth-storage.ts # API key / OAuth credential storage
│ ├── auth-guidance.ts # User-facing auth error messages
│ ├── extensions/ # ═══ EXTENSION SYSTEM ═══
│ │ ├── types.ts # Type surface (1545 dòng)
│ │ ├── loader.ts # jiti-based extension loader (~607 dòng)
│ │ ├── runner.ts # ExtensionRunner lifecycle manager (~1024 dòng)
│ │ ├── wrapper.ts # Tool wrapping utilities
│ │ └── index.ts # Re-exports (~170 dòng)
│ ├── compaction/ # ═══ COMPACTION ═══
│ │ ├── compaction.ts # Context compaction logic (~840 dòng)
│ │ ├── branch-summarization.ts # Tree navigation summarization (~356 dòng)
│ │ ├── utils.ts # File ops tracking + serialization
│ │ └── index.ts
│ └── tools/ # ═══ BUILT-IN TOOLS ═══
│ ├── index.ts # Tool registry + factories (~198 dòng)
│ ├── read.ts # File reading with truncation
│ ├── bash.ts # Shell command execution
│ ├── edit.ts # Exact text replacement
│ ├── write.ts # File creation/overwrite
│ ├── grep.ts # Regex search
│ ├── find.ts # File name search
│ ├── ls.ts # Directory listing
│ ├── file-mutation-queue.ts # Serialized file writes
│ ├── truncate.ts # Output truncation strategies
│ └── render-utils.ts
├── modes/ # ═══ RUN MODES ═══
│ ├── index.ts # Re-exports
│ ├── interactive/ # Interactive TUI mode (5470 dòng)
│ │ ├── interactive-mode.ts # Main TUI loop + all slash commands
│ │ ├── components/ # 30+ TUI components (assistant messages, diffs, editors...)
│ │ └── theme/ # Theme engine (JSON-based, hot-reload)
│ ├── print-mode.ts # Non-interactive / JSON output mode
│ └── rpc/ # JSON-RPC mode for embedding (parent-child protocol)
│ ├── rpc-mode.ts # RPC server loop
│ ├── rpc-client.ts # RPC client for SDK/programmatic use
│ ├── rpc-types.ts # JSON-RPC message types
│ └── jsonl.ts # JSONL output formatting
└── utils/ # Shared utilities
├── clipboard.ts # Clipboard integration
├── frontmatter.ts # YAML frontmatter parser
├── shell.ts # Shell detection/config
├── paths.ts # Path utilities
└── sleep.ts # Promise-based sleep
```
## 3. Các file chính - số dòng
| File | Dòng | Mô tả |
|---|---|---|
| `modes/interactive/interactive-mode.ts` | 5470 | Interactive TUI + tất cả 21 slash command handlers |
| `core/agent-session.ts` | 3099 | AgentSession class: prompt, compaction, bash, model management |
| `core/extensions/types.ts` | 1545 | Toàn bộ type surface cho extension system |
| `core/session-manager.ts` | 1425 | Session file I/O, entry types, tree operations |
| `core/settings-manager.ts` | ~1069 | JSON settings management (global + project) |
| `core/extensions/runner.ts` | ~1024 | ExtensionRunner: event emission, context binding |
| `core/resource-loader.ts` | ~920 | Unified loader for extensions/skills/prompts/themes |
| `core/compaction/compaction.ts` | ~840 | Compaction logic + cut-point detection |
| `main.ts` | 731 | CLI entry: arg parsing → mode dispatch |
| `core/extensions/loader.ts` | ~607 | jiti-based TypeScript module loading |
## 4. Luồng thực thi chính
### 4.1 Startup sequence (`main.ts`)
```
main(args)
├── parseArgs(args) # Parse CLI flags
├── resolveAppMode() # interactive | print | json | rpc
├── runMigrations() # Upgrade old session formats
├── createSessionManager() # new/fork/continue/resume/in-memory
├── createAgentSessionRuntime(createRuntime) # Build full runtime
│ └── createRuntime(cwd, agentDir, sessionManager)
│ ├── createAgentSessionServices() # authStorage, modelRegistry, resourceLoader
│ ├── resolveModelScope() # --models flag → scoped models
│ ├── buildSessionOptions() # model, thinking, tools, scopedModels
│ └── createAgentSessionFromServices() → AgentSession
├── readPipedStdin() # Pipe support
├── prepareInitialMessage() # text + images
└── dispatch:
├── interactive → new InteractiveMode(runtime).run()
├── print/json → runPrintMode(runtime, {...})
└── rpc → runRpcMode(runtime)
```
### 4.2 AgentSession.prompt() lifecycle
```
session.prompt(text)
├── parseSkillBlock() # <skill name="..." location="...">
├── expandPromptTemplate() # @file expansion
├── emitInput() # Extension can transform/block input
├── emitBeforeAgentStart() # Extension can inject custom message / swap system prompt
├── agent.runAgentLoop()
│ ├── context → extension transform messages
│ ├── before_provider_request → extension modify payload
│ ├── streamSimple(model, context, ...)
│ ├── after_provider_response → extension observe response
│ ├── tool_call → extension intercept/block/mutate args
│ ├── tool_execution_start/update/end
│ ├── tool_result → extension modify result
│ └── auto-compaction check (after turn_end)
└── emitAgentEnd()
```
### 4.3 Run modes
| Mode | Class/Function | Đặc điểm |
|---|---|---|
| **Interactive** | `InteractiveMode` (5470 dòng) | Full TUI: chat history, editor, widgets, themes, overlays, keybindings |
| **Print/JSON** | `runPrintMode()` | Pipe/script: plain text or JSON mode, no TUI |
| **RPC** | `runRpcMode()` | JSON-RPC 2.0 over stdin/stdout — dùng làm child process protocol |
## 5. AgentSession class chi tiết
### 5.1 Properties
```typescript
class AgentSession {
readonly agent: Agent; // Core agent instance
readonly sessionManager: SessionManager; // Session file I/O
readonly settingsManager: SettingsManager;// Settings
// Model access
get model(): Model<any> | undefined;
get thinkingLevel(): ThinkingLevel;
get scopedModels(): Array<{model, thinkingLevel}>;
// Tool access
get toolNames(): string[]; // Currently active tools
get tools(): ToolInfo[]; // All registered tools with metadata
getAllTools(): ToolInfo[];
// Context
getContextUsage(): ContextUsage | undefined;
isIdle(): boolean;
// Core operations
prompt(text, options?): Promise<void>; // Send user message
abort(): void; // Abort current operation
shutdown(): void; // Graceful shutdown
// Model management
cycleModel(forward?): ModelCycleResult; // Ctrl+P cycling
setModel(model): Promise<boolean>; // Switch model
setThinkingLevel(level): void;
// Compaction
compact(options?): void; // Manual compaction
getSessionStats(): SessionStats; // Usage stats
}
```
### 5.2 Internal state machine
Key internal flags:
- `_steeringMessages[]` / `_followUpMessages[]`: Queued messages
- `_compactionAbortController` / `_autoCompactionAbortController`: Compaction control
- `_overflowRecoveryAttempted`: Context overflow recovery flag
- `_retryAttempt` / `_retryPromise`: Auto-retry state
- `_bashAbortController` / `_pendingBashMessages[]`: Bash execution state
- `_turnIndex`: Current turn counter
### 5.3 Tool hooks
`_installAgentToolHooks()` installs interceptors on the Agent instance:
- `beforeToolCall`: Check if extension wants to intercept/block
- `onToolResult`: Check if extension wants to modify result
## 6. Session Persistence (`session-manager.ts`)
### 6.1 Session file format
JSONL file (`.pi/sessions/{id}.jsonl`) với các entry types:
| Entry Type | Purpose | Fields |
|---|---|---|
| `session` | Header | version, id, timestamp, cwd, parentSession |
| `message` | AgentMessage (user/assistant/toolResult) | message |
| `thinking_level_change` | Thinking level change | thinkingLevel |
| `model_change` | Model switch | provider, modelId |
| `compaction` | Compaction summary | summary, firstKeptEntryId, tokensBefore, details |
| `branch_summary` | Branch navigation | summary, fromId, details |
| `custom_message` | Extension-defined for LLM context | customType, content, display, details |
| `custom` | Extension state (not in LLM context) | customType, data |
Current version: `CURRENT_SESSION_VERSION = 3`
### 6.2 Session tree
- Mỗi session có `parentSession` reference (khi fork)
- `SessionManager.forkFrom()` tạo session mới
- `buildSessionContext()` dựng messages từ entries (cả compaction + branch summary)
- `navigateTree()` di chuyển giữa các branch trong cùng session
## 7. Compaction System
### 7.1 Auto-compaction (`compaction/compaction.ts`)
Default settings:
```
reserveTokens: 16384 # Dành cho system prompt + LLM response
keepRecentTokens: 20000 # Giữ các messages gần đây
```
Process:
1. `shouldCompact()` — kiểm tra context usage sau mỗi turn
2. `findCutPoint()` — tìm vị trí cắt dựa vào file operations
3. `prepareCompaction()` — build messagesToSummarize + turnPrefixMessages
4. `compact()` — serialize → LLM summarize → return CompactionResult
5. SessionManager lưu `CompactionEntry` + tạo session mới (reload)
### 7.2 Branch summarization (`compaction/branch-summarization.ts`)
Khi user navigate session tree, tạo summary của branch hiện tại:
- `collectEntriesForBranchSummary()` — thu thập entries cần summarize
- `prepareBranchEntries()` — extract messages + file operations
- `generateBranchSummary()` — gọi LLM tạo summary
### 7.3 Cut-point strategy
Tìm cut-point dựa trên:
- File operations: ưu tiên cắt ở điểm không có pending file modifications
- Assistant messages: không cắt giữa tool calls
- Keep recent tokens: giữ ít nhất `keepRecentTokens` cuối cùng
## 8. Built-in Tools
7 tools, mỗi tool có 2 representations:
- `AgentTool` — runtime execution contract
- `ToolDefinition` — type-safe definition với schema + render
| Tool | File | Key params | Đặc điểm |
|---|---|---|---|
| `read` | `tools/read.ts` | path, offset, limit | Head/tail truncation, image support |
| `bash` | `tools/bash.ts` | command, timeout | AbortController, timeout |
| `edit` | `tools/edit.ts` | path, edits[{oldText,newText}] | Exact replacement, multi-edit |
| `write` | `tools/write.ts` | path, content | Overwrite/create |
| `grep` | `tools/grep.ts` | pattern, path | Regex search |
| `find` | `tools/find.ts` | pattern, path | File name glob |
| `ls` | `tools/ls.ts` | path | Directory listing |
**File mutation queue** (`file-mutation-queue.ts`): Serializes write operations to prevent
parallel tool conflicts. Used internally by edit/write tools.
## 9. Settings Manager (`settings-manager.ts`)
Quản lý `settings.json` với các section:
| Section | Key settings | Default |
|---|---|---|
| `compaction` | enabled, reserveTokens, keepRecentTokens | true, 16384, 20000 |
| `retry` | enabled, maxRetries, baseDelayMs | true, 3, 2000 |
| `retry.provider` | timeoutMs, maxRetries, maxRetryDelayMs | (SDK defaults) |
| `terminal` | showImages, imageWidthCells, clearOnShrink, showTerminalProgress | true, 60, false, false |
| `images` | autoResize, blockImages | true, false |
| `thinkingBudgets` | minimal, low, medium, high | (per-level defaults) |
| `markdown` | codeBlockIndent | " " |
Scope: global (`~/.pi/agent/settings.json`) + project-local (`.pi/settings.json`).
## 10. Slash Commands
21 built-in commands (`slash-commands.ts`):
| Command | Purpose |
|---|---|
| `settings` | Open settings menu |
| `model` | Select model (selector UI) |
| `scoped-models` | Enable/disable models for Ctrl+P |
| `export` | Export session (HTML/JSONL) |
| `import` | Import session from JSONL |
| `share` | Share as GitHub gist |
| `copy` | Copy last message |
| `name` | Set session display name |
| `session` | Show session info + stats |
| `changelog` | Show changelog |
| `hotkeys` | Show keyboard shortcuts |
| `fork` | Fork from previous message |
| `clone` | Duplicate session |
| `tree` | Navigate session tree |
| `login`/`logout` | Auth management |
| `new` | Start new session |
| `compact` | Manual compaction |
| `resume` | Resume different session |
| `reload` | Reload extensions/skills/themes |
| `quit` | Exit |
## 11. RPC Mode
JSON-RPC 2.0 protocol qua stdin/stdout:
```typescript
// Request
{ "jsonrpc": "2.0", "id": 1, "method": "prompt", "params": { "text": "..." } }
// Response
{ "jsonrpc": "2.0", "id": 1, "result": { "messages": [...], "usage": {...} } }
// Notification (no id)
{ "jsonrpc": "2.0", "method": "event", "params": { "type": "message_start", ... } }
```
Đây là protocol chính cho parent-child communication trong pi-subagents và pi-crew.
## 12. Các điểm đáng chú ý
1. **Interactive mode quá lớn** (5470 dòng) — chứa hầu hết slash command implementations
2. **AgentSession quá lớn** (3099 dòng) — mixed concerns: prompt, compaction, bash, lifecycle
3. **Extension type surface** (1545 dòng) — rất comprehensive nhưng complex
4. **Lockstep versioning** — tất cả packages cùng version 0.70.5
5. **jiti-based extension loading** — cho phép TypeScript extensions không cần compile
6. **Virtual modules** — cho Bun compiled binary, bundle sẵn các dependencies

View File

@@ -0,0 +1,174 @@
# Research: `source/pi-crew` as New Reference Source
Date: 2026-04-29
Reference source: `D:/my/my_project/source/pi-crew` (`@melihmucuk/pi-crew@1.0.14`, commit `c0631a3`)
Current target: `D:/my/my_project/pi-crew` (`pi-crew@0.1.34`)
Research run: `team_20260429091311_8047706b`
> Note: the parallel research run produced useful artifacts, but child workers were marked failed because they did not exit within 5s after their final assistant message. The source audit content was still captured in result/shared artifacts.
## Executive Summary
`source/pi-crew` is a compact, in-process subagent orchestration extension. It is not a team/workflow engine; instead, it focuses on fast non-blocking subagent sessions, owner-routed steering-message delivery, interactive subagents, and context-overflow recovery. It is valuable as a reference for **session-native subagent runtime**, **delivery semantics**, and **minimal interactive worker UX**.
Current `pi-crew` is more powerful and durable: child Pi workers, teams/workflows, task graph scheduling, worktrees, mailbox, event logs, dashboard, notifications, and recovery state. The best path is not replacement; it is selective porting of patterns into `pi-crew`'s existing `live-session-runtime` / `SubagentManager` as an optional session-native lane.
## Source File Map
| Area | Reference files |
|---|---|
| Extension entry/session hooks | `source/pi-crew/extension/index.ts` |
| Runtime singleton | `source/pi-crew/extension/runtime/crew-runtime.ts` |
| Delivery routing | `source/pi-crew/extension/runtime/delivery-coordinator.ts` |
| State model/registry | `source/pi-crew/extension/runtime/subagent-state.ts`, `source/pi-crew/extension/runtime/subagent-registry.ts` |
| Overflow recovery | `source/pi-crew/extension/runtime/overflow-recovery.ts` |
| Session bootstrap | `source/pi-crew/extension/bootstrap-session.ts` |
| Agent discovery | `source/pi-crew/extension/agent-discovery.ts` |
| Tool registration | `source/pi-crew/extension/integration/register-tools.ts`, `source/pi-crew/extension/integration/tools/*.ts` |
| Message renderers | `source/pi-crew/extension/integration/register-renderers.ts` |
| Message formatting | `source/pi-crew/extension/subagent-messages.ts` |
| Status widget | `source/pi-crew/extension/status-widget.ts` |
| Architecture doc | `source/pi-crew/docs/architecture.md` |
## Architecture Observations
### Reference `source/pi-crew`
- Process-level singleton `CrewRuntime` survives Pi runtime/session replacement and rebinds on `session_start`.
- Subagents are in-process SDK `AgentSession`s created with `createAgentSession()`.
- Parent/child linkage uses `SessionManager.newSession({ parentSession })`.
- Subagent resource loading filters out the pi-crew extension through `extensionsOverride` to prevent recursive `crew_spawn` loops.
- Results are delivered through Pi-native `sendMessage()` with explicit idle/streaming semantics.
- Interactive subagents are first-class: `interactive: true` workers enter `waiting`; parent continues with `crew_respond`; cleanup is explicit with `crew_done`.
- Overflow recovery tracks `agent_end`, `compaction_start/end`, and `auto_retry_start/end` events around `session.prompt()`.
- State is in-memory only; subagent session files remain for post-hoc `/resume` inspection.
### Current `pi-crew`
- Primary runtime is child Pi process execution with durable `.crew/state` manifests and artifacts.
- It has workflow/team abstractions, task graphs, worktree support, event log, mailbox, dashboard panes, render scheduler, notifications, and diagnostic exports.
- It already has `live-session-runtime.ts`, but the current product surface centers on durable child-process workers rather than interactive in-process subagents.
## Extension API Patterns Worth Reusing
| Pattern | Reference source | Why it matters for current `pi-crew` |
|---|---|---|
| Owner-routed delivery by `sessionManager.getSessionId()` | `delivery-coordinator.ts` | Avoids sending async worker results to the wrong active session after `/resume`, `/new`, `/fork`, or multi-session use. |
| Idle vs streaming delivery split | `subagent-messages.ts`, `delivery-coordinator.ts` | Prevents messages from getting stuck: idle sessions need `triggerTurn`; streaming sessions need `deliverAs: "steer"`. |
| Deferred pending flush via `setTimeout(0)` | `delivery-coordinator.ts` | Avoids lost JSONL/custom-message persistence during resume before listeners reconnect. |
| `extensionsOverride` filter | `bootstrap-session.ts` | Required for any in-process worker lane to prevent recursive subagent spawning. |
| Fire-and-forget interactive response | `crew-respond.ts`, `crew-runtime.ts` | Lets parent stay responsive while an interactive worker continues in background. |
| No duplicate done message | `crew-done.ts` | Avoids repeating the last subagent response during cleanup. |
| Source-specific abort reasons | `crew-abort.ts`, `index.ts` shutdown handlers | Better diagnostics than generic "aborted by user". |
| Emergency unrestricted abort command | `register-command.ts` | Useful escape hatch distinct from owner-scoped tool actions. |
| Overflow tracker around SDK prompt | `overflow-recovery.ts` | Better UX for context overflow/compaction/retry in session-native workers. |
## Key Differences / Non-Goals
| Dimension | Reference `source/pi-crew` | Current `pi-crew` |
|---|---|---|
| Runtime | In-process `AgentSession` | Child Pi processes + durable orchestration |
| State | In-memory map | Durable manifests/event logs/artifacts |
| Scope | Flat subagent spawn/respond/done | Teams, workflows, task graph, worktrees |
| Result UX | Pi steering/custom messages | Tool results, mailbox, dashboard, async status |
| Interactive workers | Native | Not yet first-class |
| Worktree isolation | None | First-class |
| Replay/restart | Limited | Strong durable recovery |
Do **not** replace the current runtime wholesale. Reference `source/pi-crew` lacks durable state, worktrees, workflow scheduling, artifact indexing, and the Phase 8 operator experience. Its best value is a narrower session-native execution lane and delivery correctness patterns.
## Recommendations
### P0 — Adopt Delivery Semantics for Async/Live Results
Implement or adapt a small owner-routed delivery coordinator in current `pi-crew`:
- Key by owner `sessionId`, not session file.
- Queue pending messages when owner inactive.
- On `session_start`, flush pending messages on next macrotask.
- Use idle/streaming split:
- idle: `sendMessage(payload, { triggerTurn: true })`
- streaming: `sendMessage(payload, { deliverAs: "steer", triggerTurn: true })`
- Keep current mailbox/event-log as durable source of truth; use delivery coordinator only for live UX.
Likely target files:
- `pi-crew/src/extension/register.ts`
- `pi-crew/src/runtime/subagent-manager.ts`
- `pi-crew/src/runtime/live-session-runtime.ts`
- `pi-crew/src/extension/notification-router.ts`
### P1 — Add Optional Session-Native Subagent Lane
Build an opt-in lane on top of existing `live-session-runtime.ts` rather than changing the default child-process runtime:
- `runtime.mode = "child-process" | "live-session" | "auto"` already exists conceptually; tighten semantics.
- Use `SessionManager.newSession({ parentSession })` and `createAgentSession()` for in-process workers.
- Filter `pi-crew` out of subagent resource loader extensions.
- Persist minimal metadata to existing `.crew/state` so dashboards/recovery still work.
This can reduce process startup overhead and blank console issues, while preserving child-process isolation as the safe default.
### P1 — Introduce Interactive Worker Semantics
Add first-class interactive subagents without disrupting teams:
- New status: `waiting` for interactive background workers.
- `crew_agent_respond` / `crew_agent_done` or extend existing `crew_agent_steer` semantics.
- Fire-and-forget response: parent tool returns immediately; worker response arrives as mailbox/steering message.
- `done` performs cleanup only; no duplicate response.
Likely target files:
- `pi-crew/src/runtime/crew-agent-records.ts`
- `pi-crew/src/runtime/subagent-manager.ts`
- `pi-crew/src/extension/registration/subagent-tools.ts`
- `pi-crew/src/state/mailbox.ts`
- `pi-crew/src/ui/dashboard-panes/agents-pane.ts`
### P2 — Port Overflow Recovery Tracker for Live Sessions
For session-native workers, wrap `AgentSession.prompt()` with an event tracker similar to `source/pi-crew/extension/runtime/overflow-recovery.ts`:
- Track `compaction_start/end` and `auto_retry_start/end`.
- Report recovered context overflow separately from hard failure.
- Emit durable event-log records and dashboard health hints.
This should not apply to child Pi workers directly; they already have process/transcript supervision.
### P2 — Improve Abort Reason Taxonomy
Adopt explicit abort source reasons across all worker paths:
- tool-triggered abort
- command-triggered emergency abort
- session quit cleanup
- session replacement detach/deactivate
- watchdog timeout
- stale heartbeat kill
This improves diagnostics, notification routing, and Phase 9 reliability work.
## Risks
- In-process sessions reduce OS/process isolation; failures or leaks may affect the parent Pi process.
- `extensionsOverride` is mandatory; missing it risks recursive subagent spawning.
- Pi SDK internals may shift; keep this lane optional and covered by integration tests.
- Delivery semantics must not bypass durable mailbox/event log; live messages are convenience, not source of truth.
- Interactive workers can linger in memory; require TTL/status visibility and explicit cleanup.
## Suggested Follow-Up Plan
1. Write a focused design doc: `docs/research-session-native-runtime-plan.md`.
2. Spike delivery coordinator only; no runtime swap.
3. Add tests for idle/streaming/inactive owner delivery behavior.
4. Add optional `live-session` worker lane behind config.
5. Add interactive worker status/actions after live delivery is stable.
## Research Artifacts
- `D:/my/my_project/.crew/artifacts/team_20260429091311_8047706b/results/01_discover.txt`
- `D:/my/my_project/.crew/artifacts/team_20260429091311_8047706b/results/02_explore-shard-1.txt`
- `D:/my/my_project/.crew/artifacts/team_20260429091311_8047706b/results/03_explore-shard-2.txt`
- `D:/my/my_project/.crew/artifacts/team_20260429091311_8047706b/results/04_explore-shard-3.txt`
- `D:/my/my_project/.crew/artifacts/team_20260429091311_8047706b/batches/01_discover+02_explore-shard-1+03_explore-shard-2+04_explore-shard-3.md`

View File

@@ -0,0 +1,480 @@
# Research: UI Optimization Plan
> Phase 7 plan derived from `parallel-research` run `team_20260429053958_6497405a`.
> Source artifacts:
> - `.crew/artifacts/team_20260429053958_6497405a/shared/research-summary.md`
> - `.crew/artifacts/team_20260429053958_6497405a/shared/04_synthesize.md`
> - `.crew/artifacts/team_20260429053958_6497405a/shared/01_discover.md`
> - `.crew/artifacts/team_20260429053958_6497405a/shared/02_explore-shard-1.md`
> - `.crew/artifacts/team_20260429053958_6497405a/shared/03_explore-shard-2.md`
## Overview
pi-crew already exposes the runtime data needed for a strong TUI: manifests, `tasks.json`, `agents.json`, per-agent `status.json`, `events.jsonl`, `output.log`, transcripts, and durable mailbox state. The gaps are in the UI layer:
1. Widget recreated on every timer tick (`crew-widget.ts:267-272`).
2. Live signatures miss `progress / toolUses / usage / recent output` so cached lines stay stale.
3. Multiple UI surfaces re-read the same files independently (no shared snapshot).
4. `/team-dashboard` is static — only reload via key `r`.
5. `transcript-viewer.ts` calls `readFileSync` inside `render()` on every paint.
6. Mailbox API/runtime exists but no first-class panel/badges.
7. Pi UI integration uses untyped private-like casts (`requestRender`, `setWorkingIndicator`).
The plan below sequences fixes for highest ROI and lowest risk first, lockdown the snapshot contract before refactoring surfaces, and defers anything depending on uncertain pi-mono compatibility.
## Implementation Status
> Track status here. Use `[x]` for done, `[ ]` for pending, `[-]` for won't-do/deferred.
- [x] Phase 0 — Pi UI compatibility shim
- [x] Phase 1.A — Persistent widget instance
- [x] Phase 1.B — `RunUiSnapshot` + `RunSnapshotCache`
- [x] Phase 1.C — Freshness signatures (progress / tool / usage / mtimes)
- [x] Phase 2 — Refactor widget / sidebar / dashboard / powerbar onto snapshot
- [x] Phase 3.A — `/team-dashboard` live component
- [x] Phase 3.B — Dashboard panes (agents, progress, mailbox, transcript)
- [x] Phase 4.A — Transcript viewer cache (mtime/size keyed)
- [x] Phase 4.B — Transcript bounded-tail mode
- [x] Phase 5.A — Adaptive/coalesced render scheduler
- [x] Phase 5.B — Powerbar fallback strategy + docs
- [x] Phase 5.C — Performance tests (large runs / large transcripts)
## Roadmap-Level Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Snapshot contract before refactor | Lock `RunUiSnapshot` interface in Phase 1.B before any consumer refactor | Avoid concurrent rename/conflict in widget/sidebar/dashboard |
| Persistent widget independent of snapshot | Phase 1.A done before 1.B | Quick win, doesn't block snapshot work, removes biggest CPU/flicker churn |
| Compatibility shim placed first (Phase 0) | Centralize `requestRender / setStatus / custom / setWidget` casts in `src/ui/pi-ui-compat.ts` | Every later phase consumes it; avoids re-casting in each module |
| Transcript fix split (4.A then 4.B) | Cache + invalidate first, tail-mode second | Cache by `mtime+size` is S effort and removes blocking `readFileSync` per-render; tail mode is M-L and can land later |
| Event-driven refresh deferred to Phase 5.A | Subscribe `crew.run.* / crew.subagent.* / crew.mailbox.*` only after snapshot is stable | Avoids listener leak risk during rapid refactor |
| RPC mode | Best-effort, not first-class | RPC drops function widgets; we emit string fallback via shim |
| Powerbar | Always-fallback to `setStatus`/widget; document event contract | No confirmed pi-mono consumer found in research |
| Memory safety | LRU cap 8 active + 16 recent runs in snapshot cache | Prevent leak when user browses many runs |
## Phase 0 — Pi UI Compatibility Shim
**Goal:** Eliminate ad-hoc `(ctx.ui as { requestRender?: ... })` casts; provide one typed entry-point per UI capability.
**Deliverables:**
- New file `src/ui/pi-ui-compat.ts` exporting:
- `requestRender(ctx)` — feature-detected.
- `setWorkingIndicator(ctx, opts?)` — feature-detected, no-op fallback.
- `setExtensionWidget(ctx, key, factory, options)` — wraps `setWidget`, accepts `{ persist?: boolean }` flag.
- `showCustom(ctx, ...)` — wraps `ctx.ui.custom` with overlay options.
- `setStatusFallback(ctx, key, lines, segment?)` — used when powerbar consumer is absent.
- Replace existing inline casts in `crew-widget.ts`, `register.ts`, `live-run-sidebar.ts`, `powerbar-publisher.ts`.
**Files affected:**
- `src/ui/pi-ui-compat.ts` (new)
- `src/ui/crew-widget.ts`
- `src/ui/live-run-sidebar.ts`
- `src/ui/powerbar-publisher.ts`
- `src/extension/register.ts`
**Tests:**
- Unit test asserting fallback when host lacks `requestRender` / `setWorkingIndicator`.
- Snapshot of cast removal via grep test (no `as { requestRender` left in `src/`).
**Effort:** S (0.51 day) · **Risk:** Low
## Phase 1.A — Persistent Widget Instance
**Goal:** Stop calling `setWidget` every timer tick; only call when placement/visibility/key changes.
**Approach:**
- Extend `CrewWidgetState` with `lastPlacement: string`, `lastVisibility: "hidden" | "visible"`, `lastKey: string`.
- `updateCrewWidget` decides: if state matches and component instance exists → only invalidate via shim's `requestRender()`; do NOT call `setWidget`.
- Component reads `runs` lazily inside `render(width)` using existing `activeWidgetRuns` (later replaced by snapshot in Phase 2).
**Files affected:**
- `src/ui/crew-widget.ts`
- `src/extension/register.ts` (timer interval handler)
**Tests (unit):**
- `updateCrewWidget` called N times with unchanged placement → `setWidget` invoked exactly once (count via mock).
- Switching placement triggers exactly 1 additional `setWidget`.
- Hide/clear path still calls `setWidget(WIDGET_KEY, undefined, ...)`.
**Effort:** SM (1 day) · **Risk:** Low
## Phase 1.B — `RunUiSnapshot` + `RunSnapshotCache`
**Status:** Done in Wave 2 via `src/ui/snapshot-types.ts` and `src/ui/run-snapshot-cache.ts`.
**Goal:** Single read pass per run; share results across widget/sidebar/dashboard/powerbar.
**Locked interface (do not change without bumping plan):**
```ts
export interface RunUiProgress {
total: number;
completed: number;
running: number;
failed: number;
queued: number;
}
export interface RunUiUsage {
tokensIn: number;
tokensOut: number;
toolUses: number;
}
export interface RunUiMailbox {
inboxUnread: number;
outboxPending: number;
needsAttention: number;
}
export interface RunUiSnapshot {
runId: string;
cwd: string;
fetchedAt: number;
signature: string; // stable hash; differs only when content changed
manifest: TeamRunManifest;
tasks: TeamTaskState[];
agents: CrewAgentRecord[];
progress: RunUiProgress;
usage: RunUiUsage;
mailbox: RunUiMailbox;
recentEvents: TeamEvent[]; // last N (config N=20)
recentOutputLines: string[]; // last N lines, capped at MAX_TAIL_BYTES
}
export interface RunSnapshotCache {
get(runId: string): RunUiSnapshot | undefined;
refresh(runId: string): RunUiSnapshot; // forces re-read
refreshIfStale(runId: string): RunUiSnapshot; // re-read only if mtime/size changed or TTL exceeded
invalidate(runId?: string): void; // invalidate one or all
snapshotsByKey(): Map<string, RunUiSnapshot>; // for dashboard list rendering
}
```
**Cache rules:**
- Key by `runId`.
- Stored entry includes `tasksMtime`, `tasksSize`, `agentsMtime`, `agentsSize`, `manifestMtime`, `mailboxMtime`, `outputMtime`.
- TTL = 250ms (matches existing `crew-agent-records` reader cache).
- LRU: max 8 active + 16 recent entries; evict on insert beyond limit.
- All `JSON.parse` wrapped in `try/catch`; on parse fail return previous valid entry (never crash render).
**Files affected:**
- `src/ui/run-snapshot.ts` (new)
- `src/ui/run-snapshot-cache.ts` (new)
- `src/ui/snapshot-types.ts` (new — exported types)
**Tests (unit):**
- `refreshIfStale` returns same entry when mtimes unchanged.
- File rewrite changes `signature`.
- Parse error returns last valid snapshot, no throw.
- LRU eviction at boundary.
**Effort:** ML (23 days) · **Risk:** Medium
## Phase 1.C — Freshness Signatures
**Goal:** Make widget/sidebar invalidate when progress/tool/tokens/output change, not just status.
**Changes:**
- `CrewWidgetComponent.buildSignature` includes per-agent `progress.completed`, `progress.total`, `currentTool`, `usage.tokensOut`, `lastOutputMtime`.
- `LiveRunSidebar.buildSignature` similarly includes progress/tool/usage; add `mailbox.inboxUnread`.
- Signatures derived from `RunUiSnapshot.signature` once Phase 1.B is in.
**Files affected:**
- `src/ui/crew-widget.ts`
- `src/ui/live-run-sidebar.ts`
**Tests (unit):**
- Two snapshots with same status but different progress → different signatures.
- Mock progress event → render output line count/contents change.
**Effort:** S (0.5 day) · **Risk:** Low
## Phase 2 — Refactor Surfaces onto Snapshot
**Status:** Done in Wave 2 for widget/sidebar/dashboard/powerbar, with fallback direct reads preserved when no cache is supplied.
**Goal:** Replace independent FS reads in widget / sidebar / dashboard / powerbar with `RunSnapshotCache`.
**Deliverables:**
- `crew-widget.ts` reads via `cache.refreshIfStale(runId)`.
- `live-run-sidebar.ts` same.
- `run-dashboard.ts` calls `cache.snapshotsByKey()` once per render.
- `powerbar-publisher.ts` derives segment text from snapshot.
- Remove direct `agentsFor`/`readTasks`/`readManifest` reads from UI modules.
**Files affected:**
- `src/ui/crew-widget.ts`
- `src/ui/live-run-sidebar.ts`
- `src/ui/run-dashboard.ts`
- `src/ui/powerbar-publisher.ts`
**Tests (unit):**
- One render of all four surfaces with N=10 runs triggers ≤ N cache reads (use spy).
- Snapshot reuse across surfaces in same tick (counter assert).
**Effort:** M (2 days) · **Risk:** Medium
## Phase 3.A — Live `/team-dashboard`
**Goal:** Dashboard auto-refreshes while open, preserves selection, separates active vs recent runs.
**Changes:**
- Convert `RunDashboard` from one-shot render to TUI overlay component owning its own timer (2501000ms adaptive).
- Internal state: `selectedRunId`, `activeTab`, `cachedSnapshots` (via `RunSnapshotCache`).
- Hotkey `r` no longer needed but kept as manual force-refresh.
**Files affected:**
- `src/ui/run-dashboard.ts`
- `src/extension/registration/commands.ts` (dashboard handler now overlay-based)
**Tests (unit + integration):**
- Component receives mocked snapshot updates → re-renders without losing `selectedRunId`.
- Active runs list updates when manifest status flips.
**Effort:** M (2 days) · **Risk:** Medium
## Phase 3.B — Dashboard Panes (agents · progress · mailbox · transcript)
**Goal:** First-class panel/tabs surfacing data already in snapshot.
**Tabs:**
1. **Agents** — table (agent · status · current tool · tokens · last activity).
2. **Progress / Events** — last N events with role badge and timestamps.
3. **Mailbox** — inbox unread, outbox pending, needs-attention; row actions: nudge/ack via existing `team-tool/api.ts` (`send-message`, `ack-message`).
4. **Transcript / Output** — opens existing `DurableTranscriptViewer` (post Phase 4.A).
**Files affected:**
- `src/ui/run-dashboard.ts`
- `src/ui/dashboard-panes/` (new directory: agents-pane, progress-pane, mailbox-pane, transcript-pane)
- `src/extension/team-tool/api.ts` (no API change; UI calls existing `read-mailbox`, `send-message`, `ack-message`)
**Tests (unit):**
- Mailbox pane shows badge counts from snapshot.
- Pane switching preserves selection within pane.
- Action `ack` triggers API call once and refreshes snapshot.
**Effort:** ML (3 days) · **Risk:** Medium
## Phase 4.A — Transcript Viewer Cache
**Goal:** Stop blocking `readFileSync` inside `render()`; eliminate full-parse per paint.
**Changes:**
- New `TranscriptCacheEntry { path, mtime, size, lines, parsedAt }` keyed by `(runId, taskId)`.
- `readRunTranscript` consults cache; only re-reads if `mtime` or `size` changed.
- `DurableTranscriptViewer.render` reads `cache.lines`, never the disk directly.
- TTL 500ms safety net.
**Files affected:**
- `src/ui/transcript-viewer.ts`
- `src/ui/transcript-cache.ts` (new)
**Tests (unit):**
- Two consecutive renders with unchanged file → 1 disk read.
- File grow → new cached lines, signature changes.
- Parse failure preserves last good cache.
**Effort:** S (0.5 day) · **Risk:** Low
## Phase 4.B — Bounded-Tail Mode
**Goal:** Default to last N bytes/events to keep latency bounded for large transcripts.
**Approach:**
- Default `maxTailBytes = 256 KB`.
- Tail strategy: `fs.statSync``fs.openSync` → read last N bytes → discard partial first line if file exceeds N.
- Add hotkey `f` to "load full transcript on demand"; show byte counter.
- Auto-scroll toggle (`a`) preserved.
**Files affected:**
- `src/ui/transcript-viewer.ts`
- `src/ui/transcript-cache.ts` (extend)
**Config:**
- `config.ui.transcriptTailBytes` (optional, default 262144).
**Tests (unit):**
- 1MB file → only ~256KB worth of lines parsed.
- Force-full mode loads everything.
- Tail re-aligns when first newline straddles boundary.
**Effort:** M (2 days) · **Risk:** Medium
## Phase 5.A — Adaptive Render Scheduler
**Goal:** Replace fixed 1000ms timers with event-driven refresh + low-frequency fallback.
**Approach:**
- Single `RenderScheduler` listening on `pi.events` for `crew.run.*`, `crew.subagent.*`, `crew.mailbox.*`.
- On event → invalidate snapshot + `requestRender` (debounced 50100ms via animation-frame analog).
- Fallback timer 750ms (reduced from 1000ms) only triggers if no event in window.
- All listeners disposed on extension unload + run completion.
**Files affected:**
- `src/ui/render-scheduler.ts` (new)
- `src/extension/register.ts` (replace `setInterval` block)
**Tests (unit):**
- Event burst coalesces to single `requestRender` within debounce window.
- Listeners removed after `dispose()` (counter on event emitter).
- Fallback timer fires only when no events in interval.
**Effort:** M (1.5 days) · **Risk:** LowMedium
## Phase 5.B — Powerbar Fallback Strategy
**Goal:** Don't depend on an external `powerbar:*` consumer.
**Changes:**
- Detect listener via `pi.events.listenerCount?.("powerbar:register-segment")`.
- If 0 listeners: emit AND mirror to `ctx.ui.setStatus("pi-crew", text)`.
- Document event contract in `docs/architecture.md`.
**Files affected:**
- `src/ui/powerbar-publisher.ts`
- `docs/architecture.md`
**Tests (unit):**
- No consumer → `setStatus` called.
- Consumer registered → only event emitted, no `setStatus`.
**Effort:** SM (0.51 day) · **Risk:** Medium (depends on listener-count API availability)
## Phase 5.C — Performance Tests
**Goal:** Catch regressions on large runs / transcripts.
**Suite:**
- 50 simulated runs, 200 events each → render dashboard, assert ≤ 50 disk reads / render cycle.
- 5MB transcript → tail mode reads ≤ 1MB, full mode allowed.
- 100 widget update calls without state change → ≤ 1 `setWidget` invocation.
**Files affected:**
- `test/integration/ui-performance.test.ts` (new)
**Effort:** M (1.5 days) · **Risk:** Low
## Implementation Order
> Recommended: do quick wins (Phase 0, 1.A, 1.C, 4.A) in parallel as 4 small PRs before starting Phase 1.B (snapshot foundation).
```
Wave 1 (parallel, all S effort):
[x] Phase 0 — Pi UI compat shim
[x] Phase 1.A — Persistent widget
[x] Phase 1.C — Freshness signatures (use ad-hoc fields until snapshot lands)
[x] Phase 4.A — Transcript cache
Wave 2 (sequential):
[x] Phase 1.B — RunUiSnapshot foundation
[x] Phase 2 — Refactor surfaces onto snapshot
[x] Phase 5.A — Adaptive render scheduler
Wave 3 (parallel after Wave 2):
[x] Phase 3.A — Live dashboard
[x] Phase 3.B — Dashboard panes
[x] Phase 4.B — Transcript tail mode
Wave 4 (cleanup):
[x] Phase 5.B — Powerbar fallback
[x] Phase 5.C — Perf tests
```
## Files Affected (grouped)
**New files:**
- `src/ui/pi-ui-compat.ts`
- `src/ui/run-snapshot.ts`
- `src/ui/run-snapshot-cache.ts`
- `src/ui/snapshot-types.ts`
- `src/ui/transcript-cache.ts`
- `src/ui/render-scheduler.ts`
- `src/ui/dashboard-panes/agents-pane.ts`
- `src/ui/dashboard-panes/progress-pane.ts`
- `src/ui/dashboard-panes/mailbox-pane.ts`
- `src/ui/dashboard-panes/transcript-pane.ts`
- `test/integration/ui-performance.test.ts`
**Modified files:**
- `src/ui/crew-widget.ts`
- `src/ui/live-run-sidebar.ts`
- `src/ui/run-dashboard.ts`
- `src/ui/powerbar-publisher.ts`
- `src/ui/transcript-viewer.ts`
- `src/extension/register.ts`
- `src/extension/registration/commands.ts`
- `docs/architecture.md`
**Read-only references:**
- `src/runtime/crew-agent-records.ts`
- `src/state/mailbox.ts`
- `src/extension/team-tool/api.ts`
## Risk Assessment
| Risk | Phase | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| Snapshot cache memory leak with many runs | 1.B | Medium | High | LRU cap (8 active + 16 recent), eviction unit test |
| Race between `agents.json` rewrite and UI read | 1.B | Medium | Medium | `try/catch JSON.parse` + return last valid snapshot |
| Listener leak from event-driven refresh | 5.A | Medium | Medium | Centralize in `RenderScheduler.dispose()`, integration test counts listeners post-shutdown |
| Persistent widget breaks on placement change edge cases | 1.A | Low | Medium | Diff against `lastPlacement/lastKey/lastVisibility` triple |
| Transcript tail-mode misaligns at chunk boundary | 4.B | Medium | Low | Discard partial-first-line; unit test with files at `n*chunkSize ± 1` |
| Pi RPC mode silently drops widgets | 0/2 | High | Low | Shim falls back to `setStatus` string lines |
| Powerbar consumer never appears | 5.B | High | Low | Always emit + always set status fallback |
| `requestRender` removed in future pi-mono | 0 | Low | Medium | Compat shim already feature-detects |
| Snapshot signature collision (different state, same hash) | 1.B | Low | Medium | Include mtimes + sizes + counts in hash input |
| Test suite runtime grows from perf tests | 5.C | Medium | Low | Run perf separately via dedicated script when needed |
| Concurrent refactor of widget/sidebar/dashboard while contract evolves | 1.B → 2 | Medium | High | Lock interface in 1.B PR before opening Phase 2 PR |
| Mailbox pane spams renders on incoming messages | 3.B / 5.A | Medium | Low | Debounce via `RenderScheduler`, batch mailbox events |
## Testing Strategy
**Unit (Wave 1):**
- Compat shim feature-detect fallback (Phase 0).
- `setWidget` called once per state change (Phase 1.A).
- Signature includes progress/tool/usage diff (Phase 1.C).
- Transcript cache reuses entry when mtime unchanged (Phase 4.A).
**Unit (Wave 2):**
- Snapshot cache: TTL, LRU, parse-error fallback, signature stability.
- Surface refactor: 4 surfaces share ≤ 1 read per run per tick.
- Scheduler: event coalesce, dispose, fallback timer.
**Unit (Wave 3):**
- Dashboard live refresh preserves selection.
- Pane switching state, mailbox badge counts, ack action.
- Tail-mode boundary alignment, force-full toggle.
**Integration:**
- 50-run dashboard render ≤ 50 disk reads (Phase 5.C).
- 5MB transcript tail ≤ 1MB read.
- Long-lived run (10 min simulated) without listener growth.
**Manual smoke:**
- Open `/team-dashboard`, switch panes, send mailbox message, ack from UI.
- Resize terminal, switch placement above/below editor.
- Reload extension; ensure all timers/listeners cleared.
**Regression baseline:**
- Existing 286 unit + 26 integration tests must remain green at every wave.
- Run `npm run typecheck && npm run test:unit && npm run test:integration` before each PR merge.
## Open Questions
1. **Powerbar consumer status** — is any pi-mono extension/host expected to consume `powerbar:*` events? (Decides Phase 5.B aggressiveness; default plan: always-fallback.)
2. **Target scale** — how many concurrent runs / what max transcript size should we optimize for? Plan assumes 8 active runs and 256KB tail by default.
3. **RPC mode priority** — must function widgets work in RPC, or is graceful string fallback acceptable? Plan assumes best-effort string fallback.
4. **Phase 1.B contract freeze** — once the interface ships, downstream phases depend on it. Should we publish it as `RunUiSnapshotV1` and treat changes as breaking?
## Effort Summary
| Wave | Phases | Effort | Dependency |
|---|---|---|---|
| 1 (parallel) | 0, 1.A, 1.C, 4.A | ~2.5 days total | None |
| 2 (sequential) | 1.B → 2 → 5.A | ~5.5 days | Wave 1 done |
| 3 (parallel) | 3.A, 3.B, 4.B | ~7 days | Wave 2 done |
| 4 (parallel) | 5.B, 5.C | ~3 days | Wave 3 done |
| **Total** | 12 phases | **~18 dev-days** | — |
> Quick-win path (Wave 1 only) delivers ~70% of perceived UI improvement (no flicker, fresh signatures, no transcript blocking) at <15% of total effort.

View File

@@ -0,0 +1,134 @@
# pi-crew Resource Formats
## Agent files
Location:
```text
agents/{name}.md # builtin (in this package)
~/.pi/agent/agents/{name}.md # user-global
.crew/agents/{name}.md # project (new layout)
.pi/teams/agents/{name}.md # project (legacy layout when .pi/ exists)
```
Format:
```md
---
name: executor
description: Implement planned code changes
model: claude-sonnet-4-5
fallbackModels: openai/gpt-5-mini, anthropic/claude-sonnet-4
thinking: high
tools: read, grep, find, ls, bash, edit, write
extensions: /path/to/extension.ts
skills: safe-bash
systemPromptMode: replace
inheritProjectContext: true
inheritSkills: false
triggers: auth, tests
useWhen: multi-file implementation with tests
avoidWhen: one-line typo
cost: cheap
category: implementation
---
System prompt body.
```
Optional routing metadata fields:
| Field | Meaning |
| --- | --- |
| `triggers` | Comma-separated terms that should route work to this agent/team |
| `useWhen` | Comma-separated natural-language use cases |
| `avoidWhen` | Comma-separated cases where the agent/team should not be used |
| `cost` | `free`, `cheap`, or `expensive` hint for autonomous routing |
| `category` | Free-form grouping such as `frontend`, `security`, `docs` |
## Team files
Location:
```text
teams/{name}.team.md # builtin (in this package)
~/.pi/agent/teams/{name}.team.md # user-global (shared with pi-mono)
.crew/teams/{name}.team.md # project (new layout)
.pi/teams/teams/{name}.team.md # project (legacy layout when .pi/ exists)
```
Format:
```md
---
name: implementation
description: Full implementation team
defaultWorkflow: implementation
workspaceMode: single
maxConcurrency: 3
triggers: implementation, refactor
useWhen: multi-file implementation
cost: cheap
category: implementation
---
- explorer: agent=explorer map the codebase
- planner: agent=planner create plan
- executor: agent=executor implement
- verifier: agent=verifier verify
```
Role line:
```text
- {role-name}: agent={agent-name} [model={provider/model}] [skills={a,b}|false] [maxConcurrency={n}] optional description
```
## Workflow files
Location:
```text
workflows/{name}.workflow.md # builtin (in this package)
~/.pi/agent/workflows/{name}.workflow.md # user-global
.crew/workflows/{name}.workflow.md # project (new layout)
.pi/teams/workflows/{name}.workflow.md # project (legacy layout when .pi/ exists)
```
Format:
```md
---
name: default
description: Explore, plan, execute, verify
---
## explore
role: explorer
Explore for: {goal}
## plan
role: planner
dependsOn: explore
output: plan.md
Create a plan for: {goal}
```
Step fields:
| Field | Meaning |
| --- | --- |
| `role` | Team role to run |
| `dependsOn` | Comma-separated step IDs |
| `parallelGroup` | Optional grouping metadata |
| `output` | Output file name or `false` |
| `reads` | Comma-separated read files or `false` |
| `model` | Step model override |
| `skills` | Comma-separated skills or `false` |
| `progress` | `true`/`false` |
| `worktree` | `true`/`false` metadata |
| `verify` | `true`/`false` verification marker |
Each step starts with `## step-id` followed by recognized step metadata such as `role:` before the blank line. Level-2 headings inside task bodies are preserved unless they look like a step section with recognized metadata; use `###` or lower for maximum compatibility.

View File

@@ -0,0 +1,148 @@
# pi-crew Runtime Flow
This document is a compact map of the runtime paths used by `pi-crew`.
## Main sequence
```text
User / model
│ calls team({ action: "run", ... }) or /team-run
handleTeamTool()
│ validates schema and routes action
handleRun()
├─ discoverTeams/discoverWorkflows/discoverAgents
├─ validateWorkflowForTeam
├─ expandParallelResearchWorkflow when applicable
├─ createRunManifest + tasks.json + goal artifact
├─ if async=true ─────────────────────────────────────────────┐
│ spawnBackgroundTeamRun() │
│ ├─ resolve jiti-register.mjs │
│ ├─ fail-fast if jiti missing │
│ ├─ node --import jiti-register.mjs background-runner.ts │
│ └─ parent schedules early-exit guard │
│ ▼
│ background-runner.ts
│ ├─ append async.started
│ ├─ write async.pid startup marker
│ ├─ rediscover team/workflow/agents
│ └─ executeTeamRun()
└─ if foreground/default
├─ startForegroundRun schedules session-bound run, or
└─ executeTeamRun inline for scaffold/non-scheduled paths
executeTeamRun()
├─ write run.running
├─ materialize queued/running agent records lazily
├─ build task graph index
├─ while queued tasks exist
│ ├─ taskGraphSnapshot
│ ├─ resolveBatchConcurrency
│ ├─ getReadyTasks
│ ├─ append task.progress batch event
│ ├─ mapConcurrent ready batch
│ │ └─ runTeamTask()
│ │ ├─ prepare workspace/worktree
│ │ ├─ build task packet
│ │ ├─ render prompt + dependency context
│ │ ├─ choose model candidates from Pi config
│ │ ├─ spawn child Pi process
│ │ ├─ ChildPiLineObserver parses stdout/stderr
│ │ ├─ append per-agent events/output
│ │ ├─ update agent progress/task state
│ │ ├─ parse final JSONL/session usage
│ │ └─ write result/log/transcript/metadata artifacts
│ ├─ merge task updates monotonically
│ ├─ optional adaptive plan injection
│ ├─ save tasks/agents/progress
│ └─ write batch artifact
├─ policy closeout
└─ run.completed / run.failed / run.blocked / run.cancelled
```
## Action router
| Action | Handler | Purpose |
|---|---|---|
| `run` | `team-tool/run.ts` | Create and execute a run, foreground or async. |
| `status` | `team-tool.ts` | Show manifest/tasks/agents/events and mark stale async runs failed. |
| `summary` | `session-summary.ts`/summary handler | Write/read run summary artifact. |
| `events` | `team-tool.ts` | Tail durable run events. |
| `artifacts` | `team-tool.ts` | List run artifacts. |
| `resume` | `team-tool.ts` | Requeue failed/cancelled/skipped/running tasks. |
| `cancel` | `team-tool.ts` | Mark queued/running tasks cancelled and request foreground interrupt. |
| `forget` | `run-maintenance.ts` | Delete run state/artifacts with confirmation. |
| `prune` | `run-maintenance.ts` | Remove old finished runs with confirmation. |
| `export` | `run-export.ts` | Create portable run bundle. |
| `import` / `imports` | `run-import.ts` / `import-index.ts` | Store/list imported bundles. |
| `config` | `config.ts` + config action | Show/update user/project config. |
| `doctor` | `team-tool/doctor.ts` | Platform/resource/runtime diagnostics. |
| `validate` | `validate-resources.ts` | Validate agents/teams/workflows. |
| `recommend` | `team-recommendation.ts` | Suggest team/workflow/action for a goal. |
| management | `management.ts` | Create/update/delete/rename teams, agents, workflows. |
| API | `team-tool/api.ts` | File-backed observability/control/mailbox API. |
## Worker modes
| Mode | Behavior |
|---|---|
| `child-process` | Default. Launches real child `pi` processes per task. |
| `scaffold` | Explicit dry-run. No child Pi worker execution. |
| `live-session` | Experimental/gated in-process/live agent path. |
| `auto` | Resolves to child-process unless config/env requests otherwise. |
## Important files
```text
src/extension/register.ts Pi extension entry/wiring
src/extension/team-tool/run.ts run creation and foreground/async split
src/runtime/background-runner.ts detached async entrypoint
src/runtime/async-runner.ts background spawn command/options
src/runtime/team-runner.ts workflow/task graph scheduler
src/runtime/task-runner.ts single task execution
src/runtime/child-pi.ts child Pi process and output observer
src/runtime/model-fallback.ts configured model candidates/routing
src/runtime/concurrency.ts batch concurrency decisions
src/runtime/process-status.ts pid/liveness/stale detection
src/state/state-store.ts manifest/tasks persistence
src/state/event-log.ts JSONL run events
src/runtime/crew-agent-records.ts aggregate + per-agent status files
```
## Environment variables
| Env | Effect |
|---|---|
| `PI_CREW_EXECUTE_WORKERS=0` | Disable real workers, use scaffold behavior. |
| `PI_TEAMS_EXECUTE_WORKERS=0` | Legacy alias for worker disable. |
| `PI_CREW_ENABLE_EXPERIMENTAL_LIVE_SESSION=1` | Allow experimental live-session runtime. |
| `PI_CREW_MOCK_LIVE_SESSION=success` | Test hook for live-session mock. |
| `PI_TEAMS_MOCK_CHILD_PI` | Test hook for mocked child Pi execution. |
| `PI_CREW_DEPTH`, `PI_CREW_MAX_DEPTH` | Canonical subagent recursion guard. |
| `PI_TEAMS_DEPTH`, `PI_TEAMS_MAX_DEPTH` | Legacy recursion guard aliases. |
| `PI_TEAMS_HOME` | Override user config/state home in tests. |
| `PI_TEAMS_PI_BIN` | Override child `pi` executable. |
| `PI_CODING_AGENT_DIR` | Override Pi settings/models directory for model discovery. |
| `PI_CREW_ASYNC_EARLY_EXIT_GUARD=0` | Disable 3s background early-exit guard. |
## State transition summary
```text
queued/planning/running ── completed
├─ failed
├─ blocked
└─ cancelled
```
Task states follow the same durable contract plus `skipped`. Terminal states are monotonic during parallel merge.
## Observability tips
- Use `/team-dashboard` for a UI overview.
- Use `team status runId=...` for canonical state and stale async detection.
- Read `background.log` for early import/spawn errors.
- Read `events.jsonl` for event chronology.
- Read `agents/{taskId}/status.json` for per-agent model/progress/tool status.
- Read `artifacts/{runId}/transcripts/{taskId}.jsonl` for raw child Pi transcript.

View File

@@ -0,0 +1,107 @@
# pi-crew runtime refactor source map
This document records the source projects used as the baseline for the pi-crew subagent/runtime refactor. The goal is to avoid ad-hoc fixes in critical process orchestration paths and instead align pi-crew with proven Pi extension patterns.
## Source/pi-subagents
Primary source for child-process worker execution.
- `pi-spawn.ts`: robust Pi CLI resolution on Windows and package installs.
- `async-execution.ts`: detached async runner with `windowsHide: true` to avoid blank console windows.
- `subagent-runner.ts`: streaming child Pi process runner, output capture, result extraction.
- `post-exit-stdio-guard.ts`: guards for child processes that exit before stdio fully closes.
- `result-watcher.ts` and `async-job-tracker.ts`: durable async job/result observation patterns.
- `model-fallback.ts`: model fallback policy independent of hardcoded provider assumptions.
- `subagent-control.ts`, `run-status.ts`: status and control semantics.
pi-crew alignment:
- Background runner and child worker spawn options now explicitly set `windowsHide: true`.
- Parallel research no longer gates all shard workers behind a single discover worker.
- Further work should consolidate `child-pi.ts`, `async-runner.ts`, and `subagent-manager.ts` into a durable-first subagent runtime module.
## Source/pi-subagents2
Primary source for higher-level agent management and UI patterns.
- `src/agent-manager.ts`: agent lifecycle registry boundaries.
- `src/agent-runner.ts`: invocation/run abstraction separate from UI registration.
- `src/model-resolver.ts`: cleaner model resolution responsibility.
- `src/output-file.ts`: output file abstraction.
- `src/ui/agent-widget.ts`, `src/ui/conversation-viewer.ts`: compact live status and transcript viewing.
pi-crew alignment:
- Keep `Agent`/`crew_agent` tools as thin adapters over a durable manager.
- Avoid storing essential run mapping in memory only.
- Keep UI active-only and file-backed.
## Source/pi-mono
Primary source for Pi extension API/lifecycle constraints.
- `packages/coding-agent/src/core/extensions/types.ts`: extension context/tool contracts.
- `packages/coding-agent/src/core/extensions/runner.ts`: extension execution boundaries.
- `packages/coding-agent/src/core/model-registry.ts`: available model discovery.
- `packages/coding-agent/src/modes/interactive/interactive-mode.ts`: session lifecycle/UI behavior.
pi-crew alignment:
- Treat session-bound foreground workers differently from explicit async background workers.
- Do not assume hardcoded providers/models.
- Use Pi-native UI calls without modal auto-open by default.
## Source/pi-powerbar, pi-plan, pi-diff-review, pi-extensions*
Sources for UI and small-extension patterns.
- `pi-powerbar/src/powerbar/*`: low-noise status segment publishing.
- `pi-plan/src/plan-action-ui.ts`: action-oriented UI without persistent heavy overlays.
- `pi-diff-review/src/*`: command/tool registration and review UX patterns.
- `pi-extensions2/files-widget/*`: file-backed UI composition and navigation.
pi-crew alignment:
- Keep persistent widget active-only.
- Prefer manual dashboard/transcript commands for history.
- Avoid expensive render scans and auto-opening focus-capturing overlays.
## Source/oh-my-pi
Primary source for broader agent runtime, UI, extension, hook, skill, native process, and release patterns.
Detailed distillation: `docs/research-oh-my-pi-distillation.md`.
Next implementation roadmap: `docs/next-upgrade-roadmap.md`.
Key patterns to apply:
- Separate durable run history from worker/provider prompt context.
- Distinguish steering (interrupt active work) from follow-up (continue after idle).
- Preserve cancellation invariants with structured cancel reasons and synthetic terminal events.
- Use shared/exclusive execution semantics and intent tracing for risky actions.
- Keep TUI components small, width-safe, event-driven, coalesced, and lifecycle-clean.
- Split extension/plugin lifecycle into register vs initialized side-effect phases.
- Normalize teams/workflows/agents/skills/hooks/tools into a capability inventory with disabled/shadowed states.
- Add typed lifecycle hooks for crew operations.
- Move toward append-only run history with attempt/branch provenance.
- Use cooperative cancellation tokens and two-phase process teardown for workers.
- Cache raw scan entries, not final semantic query results.
- Consider content-addressed blob artifacts for large worker outputs/log chunks.
## Current refactor checkpoints
- [x] Hide Windows console windows for background runner and child Pi workers.
- [x] Make parallel research shard workers start in parallel instead of depending on a single discover worker.
- [x] Keep direct-agent reconstruction gated by `workflow === "direct-agent"` only.
- [x] Persist subagent records and recover terminal results after restart.
- [x] Fail fast for unrecoverable persisted records without `runId` instead of hanging.
- [x] Persist direct-agent model override into task state for background/resume reconstruction.
For the current prioritized upgrade backlog, see `docs/next-upgrade-roadmap.md`.
## Remaining larger subsystem work
- Consolidate subagent runtime into `src/subagents/*` or equivalent durable-first module.
- Move model routing transparency into persisted task/subagent records: requested model, selected model, fallback chain, fallback reason.
- Add real integration smoke scripts for Windows process visibility, async restart recovery, and multi-shard fanout.
- Add adaptive planner repair/retry for invalid JSON instead of immediate block when safe.

View File

@@ -0,0 +1,238 @@
# pi-crew Usage
## Config
Optional config path:
```text
~/.pi/agent/extensions/pi-crew/config.json
```
Create a default config:
```bash
node ./pi-crew/install.mjs
```
Supported fields:
```json
{
"asyncByDefault": false,
"executeWorkers": true,
"notifierIntervalMs": 5000,
"requireCleanWorktreeLeader": true,
"autonomous": {
"profile": "suggested",
"enabled": true,
"injectPolicy": true,
"preferAsyncForLongTasks": false,
"allowWorktreeSuggestion": true
},
"runtime": {
"mode": "auto",
"groupJoin": "smart",
"groupJoinAckTimeoutMs": 300000,
"completionMutationGuard": "warn",
"requirePlanApproval": false
},
"ui": {
"widgetPlacement": "aboveEditor",
"widgetMaxLines": 8,
"powerbar": true,
"dashboardPlacement": "center",
"dashboardWidth": 72,
"dashboardLiveRefreshMs": 1000,
"autoOpenDashboard": false,
"autoOpenDashboardForForegroundRuns": false,
"showModel": true,
"showTokens": true,
"showTools": true
}
}
```
## Local Pi smoke test
```bash
cd pi-crew
npm run smoke:pi
```
Then open Pi and run:
```text
/team-doctor
/team-validate
/team-autonomy status
```
## Default run: real worker execution
By default, `pi-crew` launches each task as a separate child Pi worker process. The parent Pi session orchestrates; workers execute independently and stream output to durable run state.
```json
{
"action": "run",
"team": "default",
"goal": "Implement login with tests"
}
```
## Scaffold / dry run
Use scaffold mode only when you want durable prompts/artifacts without launching child workers.
```json
{
"action": "run",
"team": "default",
"goal": "Plan only",
"config": {
"runtime": { "mode": "scaffold" }
}
}
```
## Async run
```json
{
"action": "run",
"team": "implementation",
"goal": "Refactor auth module",
"async": true
}
```
Check status:
```json
{
"action": "status",
"runId": "team_..."
}
```
Background `Agent`/`crew_agent` subagents wake the parent Pi session when they complete, so the parent can call `get_subagent_result`/`crew_agent_result` and continue without waiting for another user prompt.
## State and API safety
State paths are validated before read/write operations. Run ids, imported bundles, artifact and transcript references, mailbox files, and agent control/log files must stay inside their expected `.crew` roots and symlink escapes are rejected. Read-only mailbox APIs return default state without creating mailbox files when no messages exist.
Group-join result delivery uses the normal outbox mailbox and normal `/team-api ... ack-message`. `runtime.groupJoinAckTimeoutMs` only emits observability (`agent.group_join.ack_timeout`) and does not block run completion.
`runtime.completionMutationGuard` defaults to `warn`. Use `off` to disable or `fail` to fail implementation-style workers that complete without observed mutation tool calls.
## Worktree mode
```json
{
"action": "run",
"team": "implementation",
"goal": "Refactor API layer",
"workspaceMode": "worktree"
}
```
The leader repository must be clean. Per-task worktrees are created under the project crew root (`.crew/` for new projects, `.pi/teams/` when the repo already has `.pi/`):
```text
<crewRoot>/worktrees/{runId}/{taskId}
```
Cleanup:
```json
{
"action": "cleanup",
"runId": "team_..."
}
```
Dirty worktrees are preserved unless `force: true` is provided.
## Slash commands
```text
/teams
/team-run default "Implement login with tests"
/team-run --team=implementation --workflow=implementation --async "Refactor auth"
/team-cancel team_...
/team-run --worktree default "Change API safely"
/team-status team_...
/team-summary team_...
/team-resume team_...
/team-events team_...
/team-artifacts team_...
/team-worktrees team_...
/team-cleanup team_...
/team-forget team_... --confirm
/team-export team_...
/team-import .crew/artifacts/team_.../export/run-export.json # or .pi/teams/artifacts/... on legacy layout
/team-imports
/team-prune --keep=20 --confirm
/team-manager
/team-dashboard
/team-api team_... read-mailbox direction=outbox
/team-api team_... send-message direction=outbox taskId=task_... to=worker body="hello"
/team-api team_... validate-mailbox repair=true
/team-init
/team-init --copy-builtins
/team-config
/team-config autonomous.profile=assisted autonomous.preferAsyncForLongTasks=true --project
/team-config --unset=autonomous.preferAsyncForLongTasks --project
/team-autonomy status
/team-autonomy on
/team-autonomy off
/team-autonomy manual
/team-autonomy suggested
/team-autonomy assisted
/team-autonomy aggressive
/team-validate
/team-help
/team-doctor
```
## Management
Create resources:
```json
{
"action": "create",
"resource": "team",
"config": {
"name": "Backend Team",
"description": "Backend work",
"scope": "project",
"defaultWorkflow": "default",
"roles": [{ "name": "executor", "agent": "executor" }]
}
}
```
Rename an agent and update team references:
```json
{
"action": "update",
"resource": "agent",
"agent": "worker",
"scope": "project",
"updateReferences": true,
"config": { "name": "better-worker" }
}
```
Delete requires confirmation:
```json
{
"action": "delete",
"resource": "team",
"team": "backend-team",
"scope": "project",
"confirm": true
}
```