Files
pi-config/extensions/pi-crew/docs/research-oh-my-pi-distillation.md

15 KiB

oh-my-pi Distillation for pi-crew

Date: 2026-05-05 Source repo: Source/oh-my-pi at 1d898a7fe chore: bump version to 14.5.3.

Scope Read

Read-only exploration covered four source areas:

  • Agent/provider runtime: packages/agent, packages/ai.
  • Main CLI/session/task implementation: packages/coding-agent.
  • TUI, extensions, hooks, skills, marketplace, rulebook docs and implementation.
  • Native/Rust reliability/performance/release docs and implementation.

Representative files and docs inspected:

  • packages/agent/src/agent-loop.ts, packages/agent/src/agent.ts, packages/agent/src/types.ts.
  • packages/ai/src/stream.ts, packages/ai/src/model-manager.ts, packages/ai/src/utils/{abort,retry,event-stream,overflow}.ts, provider adapters.
  • packages/coding-agent/src/session/*, src/extensibility/{hooks,slash-commands,skills,plugins}/*, src/task/*, src/edit/*, prompts.
  • packages/tui/src/tui.ts, docs/tui*.md, docs/extensions.md, docs/hooks.md, docs/skills.md, docs/marketplace.md, docs/rulebook-matching-pipeline.md.
  • crates/pi-natives/src/{task,shell,pty,fs_cache,glob,fd,grep}.rs, natives docs, install/release scripts.

This document rewrites the useful ideas as pi-crew-native patterns. It does not vendor or copy source code.

High-Value Patterns to Adopt

1. Separate durable run history from provider/model context

oh-my-pi keeps rich internal session messages separate from LLM-compatible provider messages. Custom events, UI messages, hook entries, and branch/compaction entries can live in durable history, while a conversion layer decides what reaches the model.

pi-crew application:

  • Keep TeamRunManifest, task records, mailbox messages, artifacts, worker events, and review/verification notes as durable run history.
  • Add a projection/conversion step before worker prompt/model invocation:
    • transformRunContextBeforeWorkerStart(...) for pruning/context injection.
    • convertRunHistoryToWorkerPrompt(...) for provider/child-Pi compatible text.
  • Avoid treating UI/runtime events as prompt text by default.

Benefit: safer compaction, mailbox summarization, and artifact hygiene without losing durable audit history.

2. Distinguish steering from follow-up

oh-my-pi's agent runtime distinguishes interrupting current work (steer) from continuing after the agent would otherwise stop (followUp).

pi-crew application:

  • Model leader/operator messages as two queues:
    • steeringQueue: urgent cancellation, nudge, priority change, user answer while worker is active.
    • followUpQueue: review/verification/documentation after a task reaches a natural stop.
  • Default to one-at-a-time delivery to reduce context shock.
  • Persist queue entries and delivery status in task mailbox/state.

Benefit: clearer interactive semantics than a single generic respond/resume path.

3. Preserve invariants on cancellation and abort

oh-my-pi propagates AbortSignal through model streaming and tool execution, distinguishes caller abort from provider-local watchdog abort, and emits synthetic tool results when abort happens after tool calls were started.

pi-crew application:

  • Use structured cancel reasons:
    • caller_cancelled
    • leader_interrupted
    • provider_timeout
    • worker_timeout
    • tool_timeout
    • shutdown
  • If a worker/tool/action has started but is cancelled, emit a terminal synthetic event/result so task history has no dangling operation.
  • Add non-abortable cleanup/finalize phases for artifact preservation and state unlock.

Benefit: fewer stuck running tasks and clearer recovery after cancellation.

4. Batch-aware execution with shared vs exclusive operations

oh-my-pi marks tools with concurrency semantics: shared tools can run concurrently, exclusive tools serialize around shared/exclusive peers, and queued tools can be skipped when steering arrives.

pi-crew application:

  • Classify worker subtasks or internal operations:
    • shared: read-only exploration, status, grep, artifact reads.
    • exclusive: edits, package manifests, lockfiles, migration/schema updates, worktree merge.
  • Attach batchId, index, total, and conflictKey metadata to task execution.
  • On new steering, skip not-yet-started low-priority operations with explicit skip reason.

Benefit: safer parallelism and more auditable conflict handling.

5. Intent tracing for destructive/tool actions

oh-my-pi optionally injects an intent field into tool schemas, strips it before execution, and keeps it for auditability.

pi-crew application:

  • Add optional _intent/intent metadata to worker tool/action events.
  • Require intent for destructive actions: cancel, delete, prune, force cleanup, edits, package publish, worktree removal.
  • Store intent in events/artifacts but never pass it to low-level execution APIs if not needed.

Benefit: reviewable why/what for high-risk actions without changing execution payloads.

6. Event-first UI with tiny component contract and coalesced rendering

oh-my-pi TUI uses small components (render(width), handleInput, invalidate) and event-driven, coalesced rendering. Components must be width-safe and lifecycle-clean.

pi-crew application:

  • Keep dashboards/widgets as projections from snapshot/event state, not direct filesystem scanners.
  • Continue using render scheduler/coalescing; add width-safety tests for all dashboard panes/widgets.
  • Components should expose dispose() for timers/theme subscriptions.
  • UI event stream should be semantic (task_started, worker_status, mailbox_updated) rather than raw file polling.

Benefit: avoids UI freezes and makes live views predictable.

7. Two-phase extension lifecycle

oh-my-pi extensions have a registration phase where side-effecting runtime methods are unavailable, followed by an initialized phase with real context/actions.

pi-crew application:

  • If pi-crew grows plugin/extension support, split APIs into:
    • registerCrewExtension(api): declare teams, workflows, hooks, commands, renderers.
    • initializeCrewExtension(context): subscribe to events, perform side effects.
  • In headless mode, UI APIs should be explicit no-ops or unavailable via hasUI.
  • Loader should collect extension errors without breaking builtin teams.

Benefit: fewer load-time side effects and safer third-party extensibility.

8. Unified capability inventory/control center

oh-my-pi normalizes extensions, skills, rules, tools, hooks, MCPs, prompts, and slash commands into a shared dashboard model with active/disabled/shadowed states.

pi-crew application:

  • Extend /team-settings or add /team-control to show a unified inventory:
    • teams, workflows, agents, skills, hooks/policies, tools, runtime providers.
  • Normalize each item to:
    • id, kind, name, description, source, path, state, disabledReason, shadowedBy, raw.
  • Persist disables by stable capability ID, not file path.

Benefit: better operator experience for complex multi-resource setups.

9. Hooks as typed lifecycle gates, not ad-hoc shell glue

oh-my-pi hooks cover session lifecycle, before-agent-start, tool-call gates, tool-result transforms, and compaction events. Blocking hooks are scoped; non-blocking hook errors are captured but do not crash streaming.

pi-crew application:

  • Define typed crew hooks:
    • before_run_start
    • before_task_start
    • task_result
    • before_cancel
    • before_publish
    • session_before_switch
    • run_recovery
  • Mark hooks as blocking or non-blocking.
  • Capture hook errors into diagnostics/status, not uncontrolled exceptions.

Benefit: safer customization for policy/security/release gates.

10. Prompt pipeline should be explicit

oh-my-pi applies slash/custom commands, templates, compaction, file mentions, hook injection, and model validation in a clear order before calling the agent.

pi-crew application:

Define a worker prompt pipeline:

  1. Parse orchestration command/control intent.
  2. Expand prompt templates/task packet.
  3. Attach selected context/artifact/mailbox summaries.
  4. Run before_worker_start hooks.
  5. Persist exact task packet/artifacts.
  6. Launch worker.

Benefit: reproducible worker prompts and easier debugging of context injection.

11. Session/run history as append-only tree

oh-my-pi persists session entries with parent relationships. Branching/forking moves the current leaf rather than rewriting past history.

pi-crew application:

  • Keep events.jsonl append-only and add optional parentEventId / attemptId / branchId fields for retries/forks.
  • Represent retry attempts as child branches from the original task prompt/result.
  • Preserve old failed attempts instead of overwriting task state only.

Benefit: better auditability and replay/debug of retries.

12. Cooperative cancellation token for long loops

oh-my-pi native code uses cancel tokens with deadlines, abort signals, heartbeat(), and async wait. Long loops over external-size input must heartbeat at bounded cadence.

pi-crew application:

  • Add a TS CancellationToken utility for internal long-running loops:
    • heartbeat(stage?: string)
    • throwIfCancelled()
    • wait()
    • abort(reason)
  • Require it in scanners over runs, artifacts, mailboxes, worktrees, and event logs.

Benefit: bounded shutdown/cancel latency and easier stuck-loop diagnostics.

13. Process lifecycle: graceful cancel, forced kill, then non-reuse

oh-my-pi shell/PTY runtime cancels gracefully, waits a grace window, forces abort/kill, drains output for bounded windows, and discards persistent sessions after cancellation/errors.

pi-crew application:

  • For child Pi workers:
    • send graceful abort/TERM;
    • wait graceMs;
    • force-kill process tree;
    • drain stdout/stderr for bounded time;
    • mark session non-reusable after timeout/protocol error/cancel.
  • Return typed status { exitCode, cancelled, timedOut, killed, cleanupErrors }.

Benefit: more deterministic worker cleanup and fewer zombie/stale runs.

14. Reserve control channel before async worker start

oh-my-pi PTY reserves its control channel before async process start, rejects duplicate starts, and always clears state in completion.

pi-crew application:

  • Install a WorkerRunCore/controller synchronously before spawn returns.
  • Expose cancel/steer immediately, even while startup is still in progress.
  • Clear controller in finally and persist terminal state.

Benefit: closes race windows where operator cannot cancel a starting worker.

15. Cache scan entries, not final query results

oh-my-pi native search caches directory entries and applies query-specific filters/scoring later. Empty stale caches trigger rescan; ordering is deterministic.

pi-crew application:

  • For run/artifact/mailbox discovery, cache raw entries/stats rather than final UI results.
  • Apply active-status/mailbox/health filters after cache retrieval.
  • Invalidate cache after state mutation.
  • Use deterministic sort keys for dashboards and summaries.

Benefit: faster UI/status with fewer stale semantic bugs.

16. Blob artifacts and bounded file access

oh-my-pi blob-artifact design uses content addressing, metadata sidecars, streaming writes, size budgets, manifest GC, and path whitelisting.

pi-crew application:

  • Introduce content-addressed large artifacts for worker transcripts/screenshots/log chunks.
  • Persist metadata sidecars with MIME, source, redaction, run/task IDs, size, hash.
  • Keep task prompts/results small by referencing artifact IDs.
  • Add GC tied to run retention.

Benefit: avoids bloating task JSON/events and improves artifact security.

17. Native/release verification checklist mindset

oh-my-pi release scripts emphasize multi-platform build artifacts, install smoke tests, spoofed-version checks, and runtime loader fallback diagnostics.

pi-crew application:

  • For npm releases, keep a release checklist with:
    • typecheck;
    • unit/integration tests;
    • npm pack --dry-run;
    • install from packed tarball in temp project;
    • Pi extension load smoke;
    • version/tag/npm consistency check.

Benefit: fewer broken published packages.

Skill/Rulebook Ideas to Port

oh-my-pi's skills/rulebook ecosystem suggests additional pi-crew resources:

  1. worker-prompt-pipeline skill: prompt assembly, context projection, before-worker hooks, artifact references.
  2. typed-hook-design skill: lifecycle gates, blocking vs non-blocking hooks, diagnostics.
  3. process-cancellation-contract skill: graceful/force kill, synthetic terminal results, non-reuse.
  4. capability-inventory-ux skill: normalized resource inventory and disable/shadow semantics.
  5. append-only-run-history skill: event tree, branch/retry provenance.

Prioritized Backlog for pi-crew

P0 / High confidence

  • Fix current runtime review findings first: waiting final status, respond semantics, no-registry model routing.
  • Add structured cancellation reason and terminal synthetic result/event for cancelled workers.
  • Centralize worker prompt pipeline and persist exact prompt packets.
  • Add width-safety tests for dashboard/widget lines.

P1 / Medium-term architecture

  • Add steering vs follow-up mailbox queues.
  • Add typed hook lifecycle for before_task_start, task_result, before_cancel, session_before_switch.
  • Add capability inventory model for teams/workflows/agents/skills/hooks/tools.
  • Add CancellationToken for long internal loops and scans.

P2 / Larger subsystem work

  • Append-only run-history tree with attempt/branch parentage.
  • Content-addressed blob artifact store with metadata sidecars and GC.
  • Worker process controller installed before spawn; process non-reuse after cancel/protocol error.
  • Raw scan-entry cache shared by dashboard/status/artifact lookup.

Anti-Patterns to Avoid

  • Building prompts from scattered inline string concatenation without a traceable pipeline.
  • Treating UI render as a place to perform heavy filesystem scans.
  • Auto-opening modal/right-sidebar UI by default when a compact widget/status line would suffice.
  • Dropping queued user-facing results just because session generation changed.
  • Cancelling a task without writing a terminal event/result.
  • Caching semantic query results that should be recomputed from raw state.
  • Letting one bad extension/resource prevent builtin operation.

Immediate Review Questions for Future Implementation

  • Should pi-crew project-local skills be allowed to shadow builtin safety skills by default, or require explicit project: namespace?
  • Should respond enqueue durable work or only deliver to live workers? Current semantics need to become explicit.
  • What is the stable capability ID scheme for teams/workflows/agents/skills/hooks?
  • Which hook events should be blocking by default and which should be diagnostic-only?
  • What artifact size threshold should trigger blob storage instead of embedding content in task/events JSON?