Files
pi-config/extensions/pi-crew/docs/research-ui-optimization-plan.md

21 KiB
Raw Blame History

Research: UI Optimization Plan

Phase 7 plan derived from parallel-research run team_20260429053958_6497405a. Source artifacts:

  • .crew/artifacts/team_20260429053958_6497405a/shared/research-summary.md
  • .crew/artifacts/team_20260429053958_6497405a/shared/04_synthesize.md
  • .crew/artifacts/team_20260429053958_6497405a/shared/01_discover.md
  • .crew/artifacts/team_20260429053958_6497405a/shared/02_explore-shard-1.md
  • .crew/artifacts/team_20260429053958_6497405a/shared/03_explore-shard-2.md

Overview

pi-crew already exposes the runtime data needed for a strong TUI: manifests, tasks.json, agents.json, per-agent status.json, events.jsonl, output.log, transcripts, and durable mailbox state. The gaps are in the UI layer:

  1. Widget recreated on every timer tick (crew-widget.ts:267-272).
  2. Live signatures miss progress / toolUses / usage / recent output so cached lines stay stale.
  3. Multiple UI surfaces re-read the same files independently (no shared snapshot).
  4. /team-dashboard is static — only reload via key r.
  5. transcript-viewer.ts calls readFileSync inside render() on every paint.
  6. Mailbox API/runtime exists but no first-class panel/badges.
  7. Pi UI integration uses untyped private-like casts (requestRender, setWorkingIndicator).

The plan below sequences fixes for highest ROI and lowest risk first, lockdown the snapshot contract before refactoring surfaces, and defers anything depending on uncertain pi-mono compatibility.

Implementation Status

Track status here. Use [x] for done, [ ] for pending, [-] for won't-do/deferred.

  • Phase 0 — Pi UI compatibility shim
  • Phase 1.A — Persistent widget instance
  • Phase 1.B — RunUiSnapshot + RunSnapshotCache
  • Phase 1.C — Freshness signatures (progress / tool / usage / mtimes)
  • Phase 2 — Refactor widget / sidebar / dashboard / powerbar onto snapshot
  • Phase 3.A — /team-dashboard live component
  • Phase 3.B — Dashboard panes (agents, progress, mailbox, transcript)
  • Phase 4.A — Transcript viewer cache (mtime/size keyed)
  • Phase 4.B — Transcript bounded-tail mode
  • Phase 5.A — Adaptive/coalesced render scheduler
  • Phase 5.B — Powerbar fallback strategy + docs
  • Phase 5.C — Performance tests (large runs / large transcripts)

Roadmap-Level Decisions

Decision Choice Rationale
Snapshot contract before refactor Lock RunUiSnapshot interface in Phase 1.B before any consumer refactor Avoid concurrent rename/conflict in widget/sidebar/dashboard
Persistent widget independent of snapshot Phase 1.A done before 1.B Quick win, doesn't block snapshot work, removes biggest CPU/flicker churn
Compatibility shim placed first (Phase 0) Centralize requestRender / setStatus / custom / setWidget casts in src/ui/pi-ui-compat.ts Every later phase consumes it; avoids re-casting in each module
Transcript fix split (4.A then 4.B) Cache + invalidate first, tail-mode second Cache by mtime+size is S effort and removes blocking readFileSync per-render; tail mode is M-L and can land later
Event-driven refresh deferred to Phase 5.A Subscribe crew.run.* / crew.subagent.* / crew.mailbox.* only after snapshot is stable Avoids listener leak risk during rapid refactor
RPC mode Best-effort, not first-class RPC drops function widgets; we emit string fallback via shim
Powerbar Always-fallback to setStatus/widget; document event contract No confirmed pi-mono consumer found in research
Memory safety LRU cap 8 active + 16 recent runs in snapshot cache Prevent leak when user browses many runs

Phase 0 — Pi UI Compatibility Shim

Goal: Eliminate ad-hoc (ctx.ui as { requestRender?: ... }) casts; provide one typed entry-point per UI capability.

Deliverables:

  • New file src/ui/pi-ui-compat.ts exporting:
    • requestRender(ctx) — feature-detected.
    • setWorkingIndicator(ctx, opts?) — feature-detected, no-op fallback.
    • setExtensionWidget(ctx, key, factory, options) — wraps setWidget, accepts { persist?: boolean } flag.
    • showCustom(ctx, ...) — wraps ctx.ui.custom with overlay options.
    • setStatusFallback(ctx, key, lines, segment?) — used when powerbar consumer is absent.
  • Replace existing inline casts in crew-widget.ts, register.ts, live-run-sidebar.ts, powerbar-publisher.ts.

Files affected:

  • src/ui/pi-ui-compat.ts (new)
  • src/ui/crew-widget.ts
  • src/ui/live-run-sidebar.ts
  • src/ui/powerbar-publisher.ts
  • src/extension/register.ts

Tests:

  • Unit test asserting fallback when host lacks requestRender / setWorkingIndicator.
  • Snapshot of cast removal via grep test (no as { requestRender left in src/).

Effort: S (0.51 day) · Risk: Low

Phase 1.A — Persistent Widget Instance

Goal: Stop calling setWidget every timer tick; only call when placement/visibility/key changes.

Approach:

  • Extend CrewWidgetState with lastPlacement: string, lastVisibility: "hidden" | "visible", lastKey: string.
  • updateCrewWidget decides: if state matches and component instance exists → only invalidate via shim's requestRender(); do NOT call setWidget.
  • Component reads runs lazily inside render(width) using existing activeWidgetRuns (later replaced by snapshot in Phase 2).

Files affected:

  • src/ui/crew-widget.ts
  • src/extension/register.ts (timer interval handler)

Tests (unit):

  • updateCrewWidget called N times with unchanged placement → setWidget invoked exactly once (count via mock).
  • Switching placement triggers exactly 1 additional setWidget.
  • Hide/clear path still calls setWidget(WIDGET_KEY, undefined, ...).

Effort: SM (1 day) · Risk: Low

Phase 1.B — RunUiSnapshot + RunSnapshotCache

Status: Done in Wave 2 via src/ui/snapshot-types.ts and src/ui/run-snapshot-cache.ts.

Goal: Single read pass per run; share results across widget/sidebar/dashboard/powerbar.

Locked interface (do not change without bumping plan):

export interface RunUiProgress {
    total: number;
    completed: number;
    running: number;
    failed: number;
    queued: number;
}

export interface RunUiUsage {
    tokensIn: number;
    tokensOut: number;
    toolUses: number;
}

export interface RunUiMailbox {
    inboxUnread: number;
    outboxPending: number;
    needsAttention: number;
}

export interface RunUiSnapshot {
    runId: string;
    cwd: string;
    fetchedAt: number;
    signature: string;        // stable hash; differs only when content changed
    manifest: TeamRunManifest;
    tasks: TeamTaskState[];
    agents: CrewAgentRecord[];
    progress: RunUiProgress;
    usage: RunUiUsage;
    mailbox: RunUiMailbox;
    recentEvents: TeamEvent[];     // last N (config N=20)
    recentOutputLines: string[];   // last N lines, capped at MAX_TAIL_BYTES
}

export interface RunSnapshotCache {
    get(runId: string): RunUiSnapshot | undefined;
    refresh(runId: string): RunUiSnapshot;            // forces re-read
    refreshIfStale(runId: string): RunUiSnapshot;     // re-read only if mtime/size changed or TTL exceeded
    invalidate(runId?: string): void;                 // invalidate one or all
    snapshotsByKey(): Map<string, RunUiSnapshot>;     // for dashboard list rendering
}

Cache rules:

  • Key by runId.
  • Stored entry includes tasksMtime, tasksSize, agentsMtime, agentsSize, manifestMtime, mailboxMtime, outputMtime.
  • TTL = 250ms (matches existing crew-agent-records reader cache).
  • LRU: max 8 active + 16 recent entries; evict on insert beyond limit.
  • All JSON.parse wrapped in try/catch; on parse fail return previous valid entry (never crash render).

Files affected:

  • src/ui/run-snapshot.ts (new)
  • src/ui/run-snapshot-cache.ts (new)
  • src/ui/snapshot-types.ts (new — exported types)

Tests (unit):

  • refreshIfStale returns same entry when mtimes unchanged.
  • File rewrite changes signature.
  • Parse error returns last valid snapshot, no throw.
  • LRU eviction at boundary.

Effort: ML (23 days) · Risk: Medium

Phase 1.C — Freshness Signatures

Goal: Make widget/sidebar invalidate when progress/tool/tokens/output change, not just status.

Changes:

  • CrewWidgetComponent.buildSignature includes per-agent progress.completed, progress.total, currentTool, usage.tokensOut, lastOutputMtime.
  • LiveRunSidebar.buildSignature similarly includes progress/tool/usage; add mailbox.inboxUnread.
  • Signatures derived from RunUiSnapshot.signature once Phase 1.B is in.

Files affected:

  • src/ui/crew-widget.ts
  • src/ui/live-run-sidebar.ts

Tests (unit):

  • Two snapshots with same status but different progress → different signatures.
  • Mock progress event → render output line count/contents change.

Effort: S (0.5 day) · Risk: Low

Phase 2 — Refactor Surfaces onto Snapshot

Status: Done in Wave 2 for widget/sidebar/dashboard/powerbar, with fallback direct reads preserved when no cache is supplied.

Goal: Replace independent FS reads in widget / sidebar / dashboard / powerbar with RunSnapshotCache.

Deliverables:

  • crew-widget.ts reads via cache.refreshIfStale(runId).
  • live-run-sidebar.ts same.
  • run-dashboard.ts calls cache.snapshotsByKey() once per render.
  • powerbar-publisher.ts derives segment text from snapshot.
  • Remove direct agentsFor/readTasks/readManifest reads from UI modules.

Files affected:

  • src/ui/crew-widget.ts
  • src/ui/live-run-sidebar.ts
  • src/ui/run-dashboard.ts
  • src/ui/powerbar-publisher.ts

Tests (unit):

  • One render of all four surfaces with N=10 runs triggers ≤ N cache reads (use spy).
  • Snapshot reuse across surfaces in same tick (counter assert).

Effort: M (2 days) · Risk: Medium

Phase 3.A — Live /team-dashboard

Goal: Dashboard auto-refreshes while open, preserves selection, separates active vs recent runs.

Changes:

  • Convert RunDashboard from one-shot render to TUI overlay component owning its own timer (2501000ms adaptive).
  • Internal state: selectedRunId, activeTab, cachedSnapshots (via RunSnapshotCache).
  • Hotkey r no longer needed but kept as manual force-refresh.

Files affected:

  • src/ui/run-dashboard.ts
  • src/extension/registration/commands.ts (dashboard handler now overlay-based)

Tests (unit + integration):

  • Component receives mocked snapshot updates → re-renders without losing selectedRunId.
  • Active runs list updates when manifest status flips.

Effort: M (2 days) · Risk: Medium

Phase 3.B — Dashboard Panes (agents · progress · mailbox · transcript)

Goal: First-class panel/tabs surfacing data already in snapshot.

Tabs:

  1. Agents — table (agent · status · current tool · tokens · last activity).
  2. Progress / Events — last N events with role badge and timestamps.
  3. Mailbox — inbox unread, outbox pending, needs-attention; row actions: nudge/ack via existing team-tool/api.ts (send-message, ack-message).
  4. Transcript / Output — opens existing DurableTranscriptViewer (post Phase 4.A).

Files affected:

  • src/ui/run-dashboard.ts
  • src/ui/dashboard-panes/ (new directory: agents-pane, progress-pane, mailbox-pane, transcript-pane)
  • src/extension/team-tool/api.ts (no API change; UI calls existing read-mailbox, send-message, ack-message)

Tests (unit):

  • Mailbox pane shows badge counts from snapshot.
  • Pane switching preserves selection within pane.
  • Action ack triggers API call once and refreshes snapshot.

Effort: ML (3 days) · Risk: Medium

Phase 4.A — Transcript Viewer Cache

Goal: Stop blocking readFileSync inside render(); eliminate full-parse per paint.

Changes:

  • New TranscriptCacheEntry { path, mtime, size, lines, parsedAt } keyed by (runId, taskId).
  • readRunTranscript consults cache; only re-reads if mtime or size changed.
  • DurableTranscriptViewer.render reads cache.lines, never the disk directly.
  • TTL 500ms safety net.

Files affected:

  • src/ui/transcript-viewer.ts
  • src/ui/transcript-cache.ts (new)

Tests (unit):

  • Two consecutive renders with unchanged file → 1 disk read.
  • File grow → new cached lines, signature changes.
  • Parse failure preserves last good cache.

Effort: S (0.5 day) · Risk: Low

Phase 4.B — Bounded-Tail Mode

Goal: Default to last N bytes/events to keep latency bounded for large transcripts.

Approach:

  • Default maxTailBytes = 256 KB.
  • Tail strategy: fs.statSyncfs.openSync → read last N bytes → discard partial first line if file exceeds N.
  • Add hotkey f to "load full transcript on demand"; show byte counter.
  • Auto-scroll toggle (a) preserved.

Files affected:

  • src/ui/transcript-viewer.ts
  • src/ui/transcript-cache.ts (extend)

Config:

  • config.ui.transcriptTailBytes (optional, default 262144).

Tests (unit):

  • 1MB file → only ~256KB worth of lines parsed.
  • Force-full mode loads everything.
  • Tail re-aligns when first newline straddles boundary.

Effort: M (2 days) · Risk: Medium

Phase 5.A — Adaptive Render Scheduler

Goal: Replace fixed 1000ms timers with event-driven refresh + low-frequency fallback.

Approach:

  • Single RenderScheduler listening on pi.events for crew.run.*, crew.subagent.*, crew.mailbox.*.
  • On event → invalidate snapshot + requestRender (debounced 50100ms via animation-frame analog).
  • Fallback timer 750ms (reduced from 1000ms) only triggers if no event in window.
  • All listeners disposed on extension unload + run completion.

Files affected:

  • src/ui/render-scheduler.ts (new)
  • src/extension/register.ts (replace setInterval block)

Tests (unit):

  • Event burst coalesces to single requestRender within debounce window.
  • Listeners removed after dispose() (counter on event emitter).
  • Fallback timer fires only when no events in interval.

Effort: M (1.5 days) · Risk: LowMedium

Phase 5.B — Powerbar Fallback Strategy

Goal: Don't depend on an external powerbar:* consumer.

Changes:

  • Detect listener via pi.events.listenerCount?.("powerbar:register-segment").
  • If 0 listeners: emit AND mirror to ctx.ui.setStatus("pi-crew", text).
  • Document event contract in docs/architecture.md.

Files affected:

  • src/ui/powerbar-publisher.ts
  • docs/architecture.md

Tests (unit):

  • No consumer → setStatus called.
  • Consumer registered → only event emitted, no setStatus.

Effort: SM (0.51 day) · Risk: Medium (depends on listener-count API availability)

Phase 5.C — Performance Tests

Goal: Catch regressions on large runs / transcripts.

Suite:

  • 50 simulated runs, 200 events each → render dashboard, assert ≤ 50 disk reads / render cycle.
  • 5MB transcript → tail mode reads ≤ 1MB, full mode allowed.
  • 100 widget update calls without state change → ≤ 1 setWidget invocation.

Files affected:

  • test/integration/ui-performance.test.ts (new)

Effort: M (1.5 days) · Risk: Low

Implementation Order

Recommended: do quick wins (Phase 0, 1.A, 1.C, 4.A) in parallel as 4 small PRs before starting Phase 1.B (snapshot foundation).

Wave 1 (parallel, all S effort):
  [x] Phase 0  — Pi UI compat shim
  [x] Phase 1.A — Persistent widget
  [x] Phase 1.C — Freshness signatures (use ad-hoc fields until snapshot lands)
  [x] Phase 4.A — Transcript cache

Wave 2 (sequential):
  [x] Phase 1.B — RunUiSnapshot foundation
  [x] Phase 2   — Refactor surfaces onto snapshot
  [x] Phase 5.A — Adaptive render scheduler

Wave 3 (parallel after Wave 2):
  [x] Phase 3.A — Live dashboard
  [x] Phase 3.B — Dashboard panes
  [x] Phase 4.B — Transcript tail mode

Wave 4 (cleanup):
  [x] Phase 5.B — Powerbar fallback
  [x] Phase 5.C — Perf tests

Files Affected (grouped)

New files:

  • src/ui/pi-ui-compat.ts
  • src/ui/run-snapshot.ts
  • src/ui/run-snapshot-cache.ts
  • src/ui/snapshot-types.ts
  • src/ui/transcript-cache.ts
  • src/ui/render-scheduler.ts
  • src/ui/dashboard-panes/agents-pane.ts
  • src/ui/dashboard-panes/progress-pane.ts
  • src/ui/dashboard-panes/mailbox-pane.ts
  • src/ui/dashboard-panes/transcript-pane.ts
  • test/integration/ui-performance.test.ts

Modified files:

  • src/ui/crew-widget.ts
  • src/ui/live-run-sidebar.ts
  • src/ui/run-dashboard.ts
  • src/ui/powerbar-publisher.ts
  • src/ui/transcript-viewer.ts
  • src/extension/register.ts
  • src/extension/registration/commands.ts
  • docs/architecture.md

Read-only references:

  • src/runtime/crew-agent-records.ts
  • src/state/mailbox.ts
  • src/extension/team-tool/api.ts

Risk Assessment

Risk Phase Likelihood Impact Mitigation
Snapshot cache memory leak with many runs 1.B Medium High LRU cap (8 active + 16 recent), eviction unit test
Race between agents.json rewrite and UI read 1.B Medium Medium try/catch JSON.parse + return last valid snapshot
Listener leak from event-driven refresh 5.A Medium Medium Centralize in RenderScheduler.dispose(), integration test counts listeners post-shutdown
Persistent widget breaks on placement change edge cases 1.A Low Medium Diff against lastPlacement/lastKey/lastVisibility triple
Transcript tail-mode misaligns at chunk boundary 4.B Medium Low Discard partial-first-line; unit test with files at n*chunkSize ± 1
Pi RPC mode silently drops widgets 0/2 High Low Shim falls back to setStatus string lines
Powerbar consumer never appears 5.B High Low Always emit + always set status fallback
requestRender removed in future pi-mono 0 Low Medium Compat shim already feature-detects
Snapshot signature collision (different state, same hash) 1.B Low Medium Include mtimes + sizes + counts in hash input
Test suite runtime grows from perf tests 5.C Medium Low Run perf separately via dedicated script when needed
Concurrent refactor of widget/sidebar/dashboard while contract evolves 1.B → 2 Medium High Lock interface in 1.B PR before opening Phase 2 PR
Mailbox pane spams renders on incoming messages 3.B / 5.A Medium Low Debounce via RenderScheduler, batch mailbox events

Testing Strategy

Unit (Wave 1):

  • Compat shim feature-detect fallback (Phase 0).
  • setWidget called once per state change (Phase 1.A).
  • Signature includes progress/tool/usage diff (Phase 1.C).
  • Transcript cache reuses entry when mtime unchanged (Phase 4.A).

Unit (Wave 2):

  • Snapshot cache: TTL, LRU, parse-error fallback, signature stability.
  • Surface refactor: 4 surfaces share ≤ 1 read per run per tick.
  • Scheduler: event coalesce, dispose, fallback timer.

Unit (Wave 3):

  • Dashboard live refresh preserves selection.
  • Pane switching state, mailbox badge counts, ack action.
  • Tail-mode boundary alignment, force-full toggle.

Integration:

  • 50-run dashboard render ≤ 50 disk reads (Phase 5.C).
  • 5MB transcript tail ≤ 1MB read.
  • Long-lived run (10 min simulated) without listener growth.

Manual smoke:

  • Open /team-dashboard, switch panes, send mailbox message, ack from UI.
  • Resize terminal, switch placement above/below editor.
  • Reload extension; ensure all timers/listeners cleared.

Regression baseline:

  • Existing 286 unit + 26 integration tests must remain green at every wave.
  • Run npm run typecheck && npm run test:unit && npm run test:integration before each PR merge.

Open Questions

  1. Powerbar consumer status — is any pi-mono extension/host expected to consume powerbar:* events? (Decides Phase 5.B aggressiveness; default plan: always-fallback.)
  2. Target scale — how many concurrent runs / what max transcript size should we optimize for? Plan assumes 8 active runs and 256KB tail by default.
  3. RPC mode priority — must function widgets work in RPC, or is graceful string fallback acceptable? Plan assumes best-effort string fallback.
  4. Phase 1.B contract freeze — once the interface ships, downstream phases depend on it. Should we publish it as RunUiSnapshotV1 and treat changes as breaking?

Effort Summary

Wave Phases Effort Dependency
1 (parallel) 0, 1.A, 1.C, 4.A ~2.5 days total None
2 (sequential) 1.B → 2 → 5.A ~5.5 days Wave 1 done
3 (parallel) 3.A, 3.B, 4.B ~7 days Wave 2 done
4 (parallel) 5.B, 5.C ~3 days Wave 3 done
Total 12 phases ~18 dev-days

Quick-win path (Wave 1 only) delivers ~70% of perceived UI improvement (no flicker, fresh signatures, no transcript blocking) at <15% of total effort.