pi-config/extensions/pi-crew/skills/verification-before-done/SKILL.md

---
name: verification-before-done
description: Use when about to claim work is complete, fixed, passing, reviewed, committed, or ready to hand off.
---

# verification-before-done

Core principle: evidence before claims. A worker report, green-looking log, or previous run is not fresh verification.

Distilled from detailed reads of agent-skill patterns for verification-before-completion, TDD, review reception, and QA workflows.

## Gate Function

Before any completion claim:

1. Identify the command or inspection that proves the claim.
2. Run the full command fresh, or explicitly state why a command cannot be run.
3. Read the output, including exit code and failure counts.
4. Compare the output to the claim.
5. Report the claim only with the evidence.

## Claim-to-Evidence Table

| Claim | Requires | Not sufficient |
|---|---|---|
| Tests pass | Fresh test output with zero failures | Prior run, “should pass” |
| Typecheck passes | Typecheck command exit 0 | Lint or targeted tests only |
| Bug fixed | Original symptom/regression test passes | Code changed |
| Requirements met | Checklist against request/plan | Generic test success |
| Agent completed | Worker output plus artifact/diff/state inspection | Worker says DONE |
| Safe to commit | Relevant checks pass and status reviewed | Partial local confidence |

## Verification Ladder

Choose the smallest reliable gate, then escalate when risk requires it:

1. Read-only inspection for plans/reviews.
2. Targeted unit test for touched behavior.
3. Typecheck for TypeScript/schema/API changes.
4. Integration test for runtime, subprocess, state, filesystem, UI, config, or session behavior.
5. Full suite before commit/release or broad changes.
6. Real Pi smoke only when safe and needed.

## Done Report

Include:

- changed files or read-only status;
- commands run and pass/fail result;
- artifacts, run IDs, logs, or state paths inspected;
- behavior actually verified;
- skipped checks and why;
- risks and rollback notes.

## Red Flags

Stop before saying done if you are using words like “should”, “probably”, “looks”, “seems”, “I think”, or if you are trusting an agent report without checking evidence.