Files
pi-config/extensions/pi-crew/skills/systematic-debugging/SKILL.md

2.4 KiB
Raw Blame History

name, description
name description
systematic-debugging Use when encountering a bug, test failure, blocked run, provider error, stale state, crash, or unexpected behavior before proposing fixes.

systematic-debugging

Core principle: no fixes without root-cause investigation first. Symptom patches create new bugs and hide the real failure.

Distilled from detailed reads of systematic-debugging, root-cause tracing, TDD, and error-analysis skill patterns.

Four Phases

1. Root Cause Investigation

Before any fix:

  • read error messages, stack traces, failing assertions, task status, and logs completely;
  • reproduce narrowly and record the exact command/steps;
  • check recent diffs, commits, config changes, dependency changes, and environment differences;
  • trace data/control flow across component boundaries;
  • add temporary diagnostics only when they answer a specific question.

For pi-crew, trace:

user/tool params → config resolution → team/workflow/agent discovery → model/runtime routing → child args/env → state/events/artifacts → status/UI

2. Pattern Analysis

  • Find a similar working path in the codebase.
  • Compare working vs broken behavior field-by-field.
  • Identify dependencies: config home, project root markers, env vars, locks, stale caches, provider model capabilities.
  • Do not assume small differences are irrelevant.

3. Hypothesis and Test

  • State one hypothesis: “I think X is the root cause because Y.”
  • Test one variable at a time with the smallest read-only probe or targeted test.
  • If wrong, discard the hypothesis instead of piling on fixes.
  • After three failed fixes, question architecture or assumptions before continuing.

4. Implementation

  • Add or identify a failing regression test when practical.
  • Fix the root cause, not the symptom.
  • Avoid “while Im here” refactors.
  • Verify targeted behavior, then broader gates.

Evidence to Collect

  • failing command and exit code;
  • relevant manifest/tasks/events/mailbox files;
  • effective config paths and redacted config;
  • child Pi args/env after redaction;
  • git diff and recent commits;
  • provider/model/thinking resolution;
  • async timing/race indicators.

Anti-patterns

  • Fixing before reproducing.
  • Assuming real user global config cannot pollute tests.
  • Treating provider errors as only transient network failures.
  • Removing guards because they reveal a blocked state.
  • Editing unrelated layers before checking the hypothesis.