68 lines
2.4 KiB
Markdown
68 lines
2.4 KiB
Markdown
---
|
||
name: systematic-debugging
|
||
description: Use when encountering a bug, test failure, blocked run, provider error, stale state, crash, or unexpected behavior before proposing fixes.
|
||
---
|
||
|
||
# systematic-debugging
|
||
|
||
Core principle: no fixes without root-cause investigation first. Symptom patches create new bugs and hide the real failure.
|
||
|
||
Distilled from detailed reads of systematic-debugging, root-cause tracing, TDD, and error-analysis skill patterns.
|
||
|
||
## Four Phases
|
||
|
||
### 1. Root Cause Investigation
|
||
|
||
Before any fix:
|
||
|
||
- read error messages, stack traces, failing assertions, task status, and logs completely;
|
||
- reproduce narrowly and record the exact command/steps;
|
||
- check recent diffs, commits, config changes, dependency changes, and environment differences;
|
||
- trace data/control flow across component boundaries;
|
||
- add temporary diagnostics only when they answer a specific question.
|
||
|
||
For pi-crew, trace:
|
||
|
||
```text
|
||
user/tool params → config resolution → team/workflow/agent discovery → model/runtime routing → child args/env → state/events/artifacts → status/UI
|
||
```
|
||
|
||
### 2. Pattern Analysis
|
||
|
||
- Find a similar working path in the codebase.
|
||
- Compare working vs broken behavior field-by-field.
|
||
- Identify dependencies: config home, project root markers, env vars, locks, stale caches, provider model capabilities.
|
||
- Do not assume small differences are irrelevant.
|
||
|
||
### 3. Hypothesis and Test
|
||
|
||
- State one hypothesis: “I think X is the root cause because Y.”
|
||
- Test one variable at a time with the smallest read-only probe or targeted test.
|
||
- If wrong, discard the hypothesis instead of piling on fixes.
|
||
- After three failed fixes, question architecture or assumptions before continuing.
|
||
|
||
### 4. Implementation
|
||
|
||
- Add or identify a failing regression test when practical.
|
||
- Fix the root cause, not the symptom.
|
||
- Avoid “while I’m here” refactors.
|
||
- Verify targeted behavior, then broader gates.
|
||
|
||
## Evidence to Collect
|
||
|
||
- failing command and exit code;
|
||
- relevant manifest/tasks/events/mailbox files;
|
||
- effective config paths and redacted config;
|
||
- child Pi args/env after redaction;
|
||
- git diff and recent commits;
|
||
- provider/model/thinking resolution;
|
||
- async timing/race indicators.
|
||
|
||
## Anti-patterns
|
||
|
||
- Fixing before reproducing.
|
||
- Assuming real user global config cannot pollute tests.
|
||
- Treating provider errors as only transient network failures.
|
||
- Removing guards because they reveal a blocked state.
|
||
- Editing unrelated layers before checking the hypothesis.
|