181 lines
9.5 KiB
Markdown
181 lines
9.5 KiB
Markdown
# pi-crew Architecture
|
|
|
|
`pi-crew` is a Pi package for coordinated multi-agent work. It is intentionally durable-first: every run is represented on disk, every task has a state record, and child workers stream progress into JSONL/status files so foreground sessions, background jobs, dashboards, and later restarts all read the same source of truth.
|
|
|
|
## Layers
|
|
|
|
```text
|
|
Pi extension layer
|
|
register tools, slash commands, widget/dashboard, notifier, lifecycle cleanup
|
|
|
|
Runtime layer
|
|
team runner, task graph scheduler, child Pi process runner, async runner,
|
|
model fallback, policy engine, worktree manager, live-session experimental path
|
|
|
|
State layer (project root resolves to <crewRoot>:
|
|
- .crew/ when no .pi/ exists in the repo (default)
|
|
- .pi/teams/ when the repo already has .pi/ (legacy reuse))
|
|
<crewRoot>/state/runs/{runId}/manifest.json
|
|
<crewRoot>/state/runs/{runId}/tasks.json
|
|
<crewRoot>/state/runs/{runId}/events.jsonl
|
|
<crewRoot>/state/runs/{runId}/agents/{taskId}/status.json
|
|
<crewRoot>/artifacts/{runId}/...
|
|
```
|
|
|
|
## Run flow
|
|
|
|
```text
|
|
user/team tool
|
|
│
|
|
▼
|
|
handleTeamTool(action=run)
|
|
├─ discover agents/teams/workflows
|
|
├─ validate team/workflow refs
|
|
├─ create run manifest + task graph
|
|
├─ write goal artifact
|
|
└─ choose foreground/session-bound or async/background mode
|
|
│
|
|
├─ foreground: startForegroundRun() schedules executeTeamRun()
|
|
│
|
|
└─ async: spawnBackgroundTeamRun()
|
|
├─ node --import jiti-register.mjs background-runner.ts
|
|
├─ background-runner writes async.started + async.pid marker
|
|
└─ executeTeamRun()
|
|
├─ resolve ready task batch
|
|
├─ resolveBatchConcurrency() with hard cap
|
|
├─ runTeamTask() per task
|
|
│ ├─ build prompt + dependency context
|
|
│ ├─ choose configured Pi model candidates
|
|
│ ├─ spawn child `pi` worker
|
|
│ ├─ observe JSONL/stdout progress
|
|
│ ├─ persist agent status/events/output
|
|
│ └─ write result/log/transcript artifacts
|
|
├─ merge task updates monotonically
|
|
├─ write progress artifacts
|
|
└─ synthesize policy closeout
|
|
```
|
|
|
|
## Extension layer
|
|
|
|
`src/extension/register.ts` wires the package into Pi:
|
|
|
|
- `team` tool and management actions.
|
|
- Conflict-safe subagent tools: `crew_agent`, `crew_agent_result`, `crew_agent_steer`.
|
|
- Claude-style aliases: `Agent`, `get_subagent_result`, `steer_subagent` when available.
|
|
- Slash commands including `/team-run`, `/team-status`, `/team-dashboard`, `/team-doctor`, `/team-config`, `/team-summary`.
|
|
- Active-only widget and optional dashboard/sidebar UI.
|
|
- Foreground run scheduling and shutdown cleanup.
|
|
- Async completion notifier and session-start active-run summary.
|
|
|
|
The extension layer should remain thin: user input is normalized into tool parameters, then delegated to runtime/state modules.
|
|
|
|
## Runtime layer
|
|
|
|
### Team runner
|
|
|
|
`src/runtime/team-runner.ts` drives workflow execution. It reads queued tasks, computes the ready set from the task graph, applies concurrency limits, runs a batch, then merges results back into the latest task state. Terminal task states are monotonic: stale parallel snapshots must not regress completed/failed/cancelled/skipped tasks back to queued/running.
|
|
|
|
### Task runner
|
|
|
|
`src/runtime/task-runner.ts` executes one task. It prepares workspace/worktree context, renders a task prompt, chooses model candidates from Pi configuration, launches a child Pi process by default, and writes result artifacts. Scaffold mode is explicit dry-run only.
|
|
|
|
### Child Pi runtime
|
|
|
|
`src/runtime/child-pi.ts` is the default worker runtime. It:
|
|
|
|
- launches real `pi` child processes,
|
|
- hides Windows console windows with `windowsHide: true`,
|
|
- streams JSONL output into transcripts,
|
|
- compacts noisy message updates,
|
|
- isolates observer callback failures so progress persistence cannot kill orchestration,
|
|
- applies post-exit stdio guards for late output.
|
|
|
|
### Async background runner
|
|
|
|
`src/runtime/async-runner.ts` spawns detached background runs. Installed packages use an absolute `jiti-register.mjs` loader path because Node strip-types refuses TypeScript under `node_modules`. The runner fail-fasts if jiti is missing, and writes `async.pid` once startup begins so the parent can distinguish a healthy start from an early import crash.
|
|
|
|
### Concurrency and policy
|
|
|
|
`src/runtime/concurrency.ts` picks batch size from explicit limits, team settings, workflow settings, or built-in defaults. User-provided `limits.maxConcurrentWorkers` is hard-capped by default to prevent local DoS; `limits.allowUnboundedConcurrency=true` is an explicit opt-out and emits an observability event.
|
|
|
|
`src/runtime/policy-engine.ts` applies closeout and safety policy decisions such as limit exceeded, failed task blocking, stale workers, and green-contract failures.
|
|
|
|
### Model routing
|
|
|
|
Model choice is based on Pi's current configuration/model registry, not hardcoded providers. Task and agent records persist model attempts and routing metadata so dashboards/status can show requested model, selected model, fallback chain, and fallback reason.
|
|
|
|
## State layer
|
|
|
|
Run state is under `<crewRoot>` (`.crew/` for new projects, or `.pi/teams/` when the repo already has `.pi/`):
|
|
|
|
```text
|
|
<crewRoot>/state/runs/{runId}/
|
|
manifest.json run metadata/status/artifacts/async pid
|
|
tasks.json task graph and per-task status
|
|
events.jsonl append-only run events
|
|
events.jsonl.seq event sequence cache
|
|
agents.json aggregate agent cache
|
|
async.pid background startup marker
|
|
agents/{taskId}/
|
|
status.json per-agent status source
|
|
events.jsonl per-agent event stream
|
|
output.log compact worker output
|
|
sidechain.output.jsonl
|
|
live-control.jsonl
|
|
```
|
|
|
|
Artifacts are under:
|
|
|
|
```text
|
|
<crewRoot>/artifacts/{runId}/
|
|
goal.md
|
|
prompts/{taskId}.md
|
|
results/{taskId}.txt
|
|
logs/{taskId}.log
|
|
transcripts/{taskId}.jsonl
|
|
metadata/*.json
|
|
progress.md
|
|
summary.md
|
|
```
|
|
|
|
`<crewRoot>` resolution is centralised in `src/utils/paths.ts#projectCrewRoot()`:
|
|
|
|
- if `<repoRoot>/.pi/` already exists, return `<repoRoot>/.pi/teams/` (legacy reuse, no parallel `.crew/`)
|
|
- otherwise return `<repoRoot>/.crew/` (default for fresh projects)
|
|
|
|
User-global fallback (when no project root is detected) lives under `~/.pi/agent/extensions/pi-crew/`.
|
|
|
|
Atomic writes use temp-file replace with retry for transient Windows `EPERM`/`EBUSY`/`EACCES`. JSONL append paths are best-effort where used for observers/progress; write failures must not crash child output parsing.
|
|
|
|
## UI and observability
|
|
|
|
- The persistent widget shows active runs only.
|
|
- Stale async runs with dead background pids are hidden from the active widget.
|
|
- `/team-status` is the canonical detailed state view and can mark stale active async runs failed.
|
|
- `/team-dashboard` provides live history/details from `RunSnapshotCache`, with panes for agents, progress/events, mailbox attention, recent output, health, and metrics.
|
|
- Phase 9 observability uses a per-session `MetricRegistry` (`Counter`, `Gauge`, `Histogram`) wired to `crew.*` events via unsubscribe-returning `events.on()` handlers. The registry is disposed on session shutdown/reload; no global metric singleton is used.
|
|
- Metrics can be inspected with `/team-metrics` or `team api metrics-snapshot`, exported as redacted daily JSONL under `<crewRoot>/state/metrics/` when telemetry is enabled, formatted for Prometheus, or pushed to an opt-in OTLP HTTP endpoint.
|
|
- Heartbeat observability is split between dashboard summaries and a background `HeartbeatWatcher`: healthy/warn/stale/dead gradient metrics are emitted, first-dead detections notify operators, and consecutive dead ticks can append deadletter entries.
|
|
- Powerbar publishing is optional and event-compatible: pi-crew emits `powerbar:register-segment` for `pi-crew-active` / `pi-crew-progress`, emits `powerbar:update` payloads (`id`, `text`, optional `suffix`, `bar`, `color`), and mirrors status through `ctx.ui.setStatus("pi-crew", ...)` when no powerbar listener is detected.
|
|
- Transcript viewer is file-backed so it works for foreground and async runs; it defaults to bounded tail reads and can load full content on demand.
|
|
|
|
## Lifecycle and cleanup
|
|
|
|
Foreground runs are session-bound and should be interrupted on session shutdown or session switch. Only explicit `async: true` runs are allowed to survive the Pi session. Runtime cleanup is registered through Pi lifecycle hooks and a global reload cleanup guard.
|
|
|
|
## Configuration
|
|
|
|
Key config sections:
|
|
|
|
- `runtime`: `auto`, `child-process`, `scaffold`, experimental `live-session`.
|
|
- `limits`: concurrency/task/depth safety controls.
|
|
- `ui`: widget/dashboard/powerbar/model-token display settings.
|
|
- `observability`: in-memory metrics, heartbeat watcher interval, metric file retention.
|
|
- `telemetry`: opt-out switch for local telemetry sinks.
|
|
- `reliability`: opt-in auto-retry/auto-recover defaults and deadletter threshold.
|
|
- `otlp`: opt-in OTLP HTTP metric export.
|
|
- `agents`: builtin overrides for models/fallbacks/tools.
|
|
- `autonomous`: policy injection/profile for proactive team delegation.
|
|
|
|
See `usage.md`, `resource-formats.md`, `runtime-flow.md`, and `live-mailbox-runtime.md` for operational details.
|