2.2 KiB
2.2 KiB
name, description
| name | description |
|---|---|
| async-worker-recovery | Background worker, heartbeat, stale-run, crash-recovery, and deadletter workflow. Use when debugging stuck/dead workers or changing async run reliability. |
async-worker-recovery
Use this skill when a pi-crew run is stuck, stale, interrupted, or has dead workers.
Source patterns distilled
- pi-subagents async patterns: detached runner, status files, result watcher, stale PID reconciler
- pi-crew runtime:
src/runtime/background-runner.ts,async-runner.ts,heartbeat-watcher.ts,worker-heartbeat.ts,crash-recovery.ts,stale-reconciler.ts,deadletter.ts,delivery-coordinator.ts - UI recovery controls:
src/ui/run-dashboard.ts,src/ui/dashboard-panes/health-pane.ts,src/ui/run-action-dispatcher.ts
Rules
- Distinguish historical dead-heartbeat events from current active failures. Check manifest/task status and event timestamps.
- Heartbeat warnings should only apply to currently running/waiting work, never terminal runs/tasks.
- Stale reconciliation order: result/terminal evidence → PID liveness → stale threshold/active evidence.
- Reconcile state under run lock and re-read inside the lock before repair.
- Deadletter entries are evidence, not automatic proof of permanent failure; inspect attempts and later completion events.
- For background runs, verify PID liveness and background log before declaring stuck.
- Session delivery should queue while inactive and flush only to the current generation/session.
- Do not poll in sleep loops waiting for async completion if the system has a watcher/result notification path.
Operator checklist
- Load manifest/tasks and recent events.
- Check
manifest.async.pidand process liveness. - Check heartbeat
lastSeenAt, progresslastActivityAt, and terminal status. - Inspect deadletter and diagnostic report.
- Choose recovery: resume, retry, kill stale, diagnostic, or no-op historical notification.
Verification
cd pi-crew
npx tsc --noEmit
node --experimental-strip-types --test test/unit/heartbeat-watcher.test.ts test/unit/stale-reconciler.test.ts test/unit/deadletter.test.ts test/integration/async-restart-recovery.test.ts
npm test