Add plannotator extension v0.19.10

This commit is contained in:
2026-05-07 11:38:14 +10:00
parent e914bc59c9
commit f37e4565ff
91 changed files with 35103 additions and 0 deletions

View File

@@ -0,0 +1,574 @@
---
name: plannotator-compound
disable-model-invocation: true
description: >
Analyze a user's Plannotator plan archive to extract denial patterns, feedback
taxonomy, evolution over time, and actionable prompt improvements — then produce
a polished HTML dashboard report. Falls back to Claude Code ExitPlanMode denial
reasons when Plannotator data is unavailable.
---
# Compound Planning Analysis
You are conducting a comprehensive research analysis of a user's Plannotator plan
archive. The goal: extract patterns from their denied plans, reduce
them into actionable insights, and produce an elegant HTML dashboard report.
This is a multi-phase process. Each phase must complete fully before the next begins.
Research integrity is paramount — every file must be read, no skipping.
## Source Selection
Before starting the analysis, determine which data source is available.
1. **Plannotator mode (first-class)** — Check `~/.plannotator/plans/`. If it
exists and contains `*-denied.md` files, use this mode. The entire workflow
below is written for Plannotator data.
2. **Claude Code fallback mode** — If the Plannotator archive is absent or
contains no denied plans, check `~/.claude/projects/`. If present, read
[references/claude-code-fallback.md](references/claude-code-fallback.md)
before continuing. That reference explains how to use the bundled parser at
[scripts/extract_exit_plan_mode_outcomes.py](scripts/extract_exit_plan_mode_outcomes.py)
to extract denial reasons from Claude Code JSONL transcripts. Every phase
below has a short note explaining what changes in fallback mode — the
reference file has the details.
3. **Neither available** — Ask the user for their Plannotator plans directory or
Claude Code projects directory. Do not guess.
## Phase 0: Locate Plans & Check for Previous Reports
Use the mode chosen in Source Selection above.
**Plannotator mode:** Verify the plans directory contains `*-denied.md` files. If
none exist, fall back to Claude Code mode before stopping.
**Claude Code fallback mode:** Run the bundled parser per the fallback reference to
build the denial-reason dataset. Create `/tmp/compound-planning/` if needed.
In either mode, proceed to Previous Report Detection below.
### Previous Report Detection
After locating the plans directory, check for existing reports:
```
ls ~/.plannotator/plans/compound-planning-report*.html
```
Reports follow a versioned naming scheme:
- First report: `compound-planning-report.html`
- Subsequent reports: `compound-planning-report-v2.html`, `compound-planning-report-v3.html`, etc.
If one or more reports exist, determine the **latest** one (highest version number).
Get its filesystem modification date using `stat` (macOS: `stat -f %Sm -t %Y-%m-%d`,
Linux: `stat -c %y | cut -d' ' -f1`). This is the **cutoff date**.
Present the user with a choice:
> "I found a previous report (`compound-planning-report-v{N}.html`) last updated
> on {CUTOFF_DATE}. I can either:
>
> 1. **Incremental** — Only analyze files dated after {CUTOFF_DATE}, saving tokens
> and building on previous findings
> 2. **Full** — Re-analyze the entire archive from scratch
>
> Which would you prefer?"
Wait for the user's response before proceeding.
**If incremental:** Filter all subsequent phases to only process files with dates
after the cutoff date. The new report version will note in its header narrative that
it covers the period from {CUTOFF_DATE} to present, and reference the previous
report for earlier findings. The inventory (Phase 1) should still count ALL files
for overall stats, but clearly separate "new since last report" counts.
**If full:** Proceed normally with all files, but still use the next version number
for the output filename.
**If no previous report exists:** Proceed normally. The output filename will be
`compound-planning-report.html` (no version suffix for the first report).
## Phase 1: Inventory
Count and report the dataset. **Always count ALL files** for overall stats,
regardless of whether this is an incremental or full run:
```
- *-approved.md files (count)
- *-denied.md files (count)
- Date range (earliest to latest date found in filenames)
- Total days spanned
- Revision rate: denied / (approved + denied) — this is the "X% of plans
revised before coding" stat used in dashboard section 1
```
**Note:** Ignore `*.annotations.md` files entirely. Denied files already contain
the full plan text plus all reviewer feedback appended after a `---` separator.
Annotation files are redundant subsets of this content — reading both would
double-count feedback.
**If incremental mode:** After the total counts, separately report the counts for
files dated after the cutoff date only:
```
New since {CUTOFF_DATE}:
- *-denied.md files: X (of Y total)
- New date range: {CUTOFF_DATE} to {LATEST_DATE}
- New days spanned: N
```
If fewer than 3 new denied files exist since the cutoff, warn the user:
> "Only {N} new denied plans since the last report. The incremental analysis may
> be thin. Would you like to proceed or switch to a full analysis?"
Also run `wc -l` across all `*-approved.md` files to get average lines per
approved plan. This tells the user whether their plans are staying lightweight
or bloating over time. You do not need to read approved plan contents — just
their line counts. If possible, break this down by time period (e.g., monthly)
to show whether plan size changed.
Dates appear in filenames in YYYY-MM-DD format, sometimes as a prefix
(2026-01-07-name-approved.md) and sometimes embedded (name-2026-03-15-approved.md).
Extract dates from all filenames.
Tell the user what you found and that you're beginning the extraction.
**Claude Code fallback mode:** The Plannotator inventory fields above do not apply.
Follow the inventory instructions in
[references/claude-code-fallback.md](references/claude-code-fallback.md) instead —
report the denial-reason dataset assembled by the parser.
## Phase 2: Map — Parallel Extraction
This is the most time-intensive phase. You must read EVERY `*-denied.md` file
**in scope**. Do not skip files. Do not summarize early.
**In scope** means: all denied files if running a full analysis, or only denied
files dated after the cutoff date if running incrementally. In incremental mode,
only process files whose embedded YYYY-MM-DD date is strictly after the cutoff.
**Claude Code fallback mode:** The parser output is the clean source dataset. Read
the fallback reference for the extraction prompt and batching strategy specific to
JSON part files. Do not go back to raw `.jsonl` logs unless the parser fails or the
user asks for audit-level verification.
**Important:** Only read `*-denied.md` files. Do NOT read approved plans,
annotation files, or diff files. Each denied file contains the full plan text
followed by a `---` separator and the reviewer's feedback — everything needed
for analysis is in one file.
### Batching Strategy
All extraction agents should use `model: "haiku"` — they're doing straightforward
file reading and structured extraction, not reasoning. Haiku is faster and cheaper
for this work.
The approach depends on dataset size:
**Tiny datasets (≤ 10 total files):** Read all files directly in the main agent —
no need for sub-agents. Just read them sequentially and proceed to Phase 3.
**Small datasets (11-30 files):** Launch 2-3 parallel Haiku agents, splitting
files roughly evenly.
**Medium datasets (31-80 files):** Launch 4-6 parallel Haiku agents (~10-15 files
each). Split by file type and/or time period.
**Large datasets (80+ files):** Launch as many parallel Haiku agents as needed to
keep each batch around 10-15 files. Split by the natural time boundaries in the
data (months, quarters, or whatever groupings produce balanced batches). If one
time period dominates (e.g., the most recent month has 3x the files), split that
period into multiple batches.
Launch all extraction agents in parallel using the Agent tool with
`run_in_background: true` and `model: "haiku"`.
### Output Files
Each extraction agent must write its results to a clean output file rather than
relying on the agent task output (which contains interleaved JSONL framework
logs that are difficult to parse). Instruct each agent to write to:
```
/tmp/compound-planning/extraction-{batch-name}.md
```
Create the `/tmp/compound-planning/` directory before launching agents. The
reduce agent in Phase 3 will read these clean files directly.
### Extraction Prompt
Each agent receives this instruction (adapt the time period, file list, and
output path):
```
You are extracting structured data from denied plan files for a pattern analysis.
Directory: [PLANS DIRECTORY]
Files to read: [LIST OF SPECIFIC *-denied.md FILES]
Output: Write your complete results to [OUTPUT FILE PATH]
Each denied file contains two parts separated by a --- line:
1. The plan text (above the ---)
2. The reviewer's feedback and annotations (below the ---)
Read EVERY file in your list. For EACH file, extract:
- The plan name/topic (from the plan text above the ---)
- The denial reason or feedback given (from below the --- — capture the actual
words used)
- What was specifically asked to change
- The type of feedback (let the content determine the category — don't force-fit
into predefined types. Common types include things like: scope concerns,
approach disagreements, missing information, process requirements, quality
concerns, UX/design issues, naming disputes, clarification requests,
testing/procedural denials — but the user's actual patterns may differ)
- Any specific phrases or recurring language from the reviewer
- Individual annotations if present (numbered feedback items with quoted text
and reviewer comments)
- The date (extracted from the filename)
Do NOT skip any files. One entry per file.
Format each entry as:
**[filename]**
- Date: ...
- Topic: ...
- Denial reason: ...
- Feedback type: ...
- Specific asks: ...
- Notable phrases: ...
- Annotations: [count, with brief summary of each]
---
After processing all files, write the complete results to [OUTPUT FILE PATH].
State the total file count at the end of the file.
```
### While Agents Run
Track completion. As each agent finishes, note the count of files it processed.
Verify the total matches the inventory from Phase 1. If any agent's count is
short, flag it and consider re-launching for the missing files.
If an agent times out (possible with large batches — a batch of 128 files can
take 8+ minutes), re-launch it for just the unprocessed files. Check the output
file to see how far it got before timing out.
## Phase 3: Reduce — Pattern Analysis
Once ALL extraction agents have completed (or all files have been read for tiny
datasets), proceed with the reduction. Reduction agents should use `model: "sonnet"`
— this phase requires real analytical reasoning, not just file reading.
### Reduction Strategy
The approach depends on how many extraction files were produced:
**Standard (≤ 20 extraction files):** Launch a single Sonnet agent to read all
extraction files and produce the full analysis. This covers most datasets.
**Large (21+ extraction files):** Use a two-stage reduce:
1. **Stage 1 — Partial reduces:** Split the extraction files into groups of 4-6.
Launch parallel Sonnet agents, each reading one group and producing a partial
analysis with the same sections listed below. Each writes to
`/tmp/compound-planning/partial-reduce-{N}.md`.
2. **Stage 2 — Final reduce:** A single Sonnet agent reads all partial reduce
files and synthesizes them into the final comprehensive analysis. This agent
merges taxonomies, combines counts, deduplicates patterns, and reconciles any
conflicting categorizations across partials.
**Claude Code fallback mode:** The reduction phase is the same. The only upstream
difference is that extraction files were derived from normalized denial-reason JSON
instead of Plannotator markdown files.
### Reduction Prompt
Give each reduction agent this prompt (adapt file paths for single vs multi-stage):
```
You are a data scientist conducting the reduction phase of a map-reduce analysis
across a user's denied plan archive.
Read ALL extraction files at [FILE PATHS]
These files contain structured extractions from every denied plan file. Each
extraction includes the plan topic, denial feedback, annotations, and reviewer
language. Your job: aggregate everything, find patterns, cluster into a taxonomy,
and produce a comprehensive analysis.
Be exhaustive. Use real counts. Quote real phrases from the data. This is
research — no hand-waving, no fabrication.
Write your complete results to [OUTPUT FILE PATH].
Produce the following sections:
[... sections listed below ...]
```
The reduction agent's job is to let the data speak. Do not impose a predetermined
framework — discover what's actually there. The analysis must produce:
### 1. Denial Reason Taxonomy
Categorize every denial into a finite set of types that emerge from the data. Count
occurrences. Show percentages. Include real example quotes for each type. Aim for
8-15 categories — enough to be specific, few enough to be scannable. Let the user's
actual feedback determine what the categories are.
### 2. Top Feedback Patterns (ranked by frequency)
The 5-10 most recurring patterns. For each: what the reviewer consistently asks for,
3+ example quotes from different files, and whether the pattern changed over time.
### 3. Recurring Phrases
Exact phrases the reviewer uses repeatedly, with counts and what they signal. These
are the reviewer's vocabulary — their shorthand for what they care about.
### 4. What the Reviewer Values (implicit preferences)
Derived from patterns — what does this specific person care about most? Quality?
Speed? Narrative? Architecture? Process? Simplicity? Rank by evidence strength.
This section should feel like a personality profile of the reviewer's standards.
### 5. What Agents Consistently Get Wrong
The flip side — what recurring mistakes trigger denials? What should agents stop
doing for this reviewer?
### 6. Structural Requests
What plan structure does the reviewer consistently demand? Required sections,
ordering, format preferences, level of detail expected.
### 7. Evolution Over Time
How feedback patterns changed across the time span. Group by whatever natural time
boundaries exist in the data (weeks for short spans, months for longer ones). Did
expectations mature? Did new patterns emerge? What shifted? If the dataset spans
less than a month, note that evolution analysis is limited but still look for any
progression from early to late files.
### 8. Actionable Prompt Instructions
The most important output. Based on all patterns: specific numbered instructions
that could be embedded in a planning prompt to prevent the most common denial
reasons. Write these as actual directives an agent could follow. Be specific to
this user's patterns — generic advice like "write good plans" is worthless. Each
instruction should trace back to a real, frequent denial pattern.
After writing the instructions, calculate what percentage of denials they would
address (count how many denials fall into categories covered by the instructions
vs total denials). Report this percentage — it will be different for every user.
## Phase 4: Generate the HTML Dashboard
Build a single, self-contained HTML file as the final deliverable. Save it to
the user's plans directory with a versioned filename:
- First ever report: `compound-planning-report.html`
- Second report: `compound-planning-report-v2.html`
- Third report: `compound-planning-report-v3.html`
- And so on.
The version number was determined in Phase 0 based on existing reports found.
**If this is an incremental report**, the header should indicate the analysis
period (e.g., "March 15 March 31, 2026") and include a subtitle noting
"Incremental analysis — see v{N-1} for earlier findings." The narrative in
section 1 should frame findings as what's new or changed since the last report,
not as a complete picture. Overall stats in the header (file counts, revision
rate) should still reflect the full archive for context.
Read the template at `assets/report-template.html` for the **design language
only**. The template contains example data from a previous analysis — ignore all
data values, quotes, and percentages in the template. Use only its visual design:
colors, typography, spacing, component styles, and layout patterns.
### Design Language (from template)
- **Palette:** Light mode, warm off-white (#FDFCFB), text in slate scale, amber
for highlights/accents, emerald for positive, rose for negative, indigo for
action elements
- **Typography:** Playfair Display (serif, for narrative headings), Inter (sans,
for body/data), JetBrains Mono (mono, for code/phrases) — Google Fonts CDN
- **Layout:** Single-column, max-width 1024px, generous vertical whitespace (128px
between major sections), editorial/narrative-first aesthetic
- **Tone:** Calm, reflective, authoritative. Like a personal retrospective journal,
not a monitoring dashboard.
### Page Frame (header + footer)
Before the 7 sections, the page has:
- **Header:** Report title on the left (Playfair Display, ~36px), project name +
date range below it in light meta text. On the right: file counts in mono
(e.g., "223 denials · 71 days"). Separated from content by
a bottom border. Generous bottom padding before section 1.
- **Footer:** After section 7. Top border, centered italic Playfair Display tagline
summarizing the corpus (e.g., "Analysis of X denied plans from the Plannotator
archive.").
### Dashboard Section Order (7 sections)
The report follows this exact section order. Each section builds on the previous
one — the flow moves from "what happened" through "why" to "what to do about it":
1. **The story in the data** — An editorial narrative paragraph (Playfair Display
serif, ~26px) that tells the headline finding in prose. Not bullet points — a
real paragraph that reads like the opening of an article. Alongside it, a KPI
sidebar with 3 key metrics (the top denial percentage, the overall revision
rate, and the number of distinct denial categories found). Use an amber inline
highlight on the most striking number in the narrative.
2. **Why plans get denied** — The taxonomy as a ranked list. Each row: rank number
(mono), category label, a thin 4px progress bar (top item in amber-500, rest
in slate-300), percentage (mono), and for the top entries, a real italic quote
from the data below the label. Show the top 10 categories or however many the
data supports (minimum 5).
3. **How expectations evolved** — One card per natural time period. Each card has:
the period name in serif, a theme phrase in colored uppercase (different color
per period to show progression), a description paragraph, and a stat line at
the bottom (e.g., "X denials · Y narrative requests"). If the data spans less
than 3 distinct periods, use 2 cards or even a single card with internal
progression noted.
4. **What works vs what doesn't** — Two side-by-side cards. Left: green-tinted
(emerald-50/50 bg, emerald-100 border) with traits of plans that succeed for
this reviewer. Right: red-tinted (rose-50/50 bg, rose-100 border) with what
agents keep getting wrong. Both derived from the reduction analysis. Bulleted
with small colored dots. 5-8 items per card.
5. **The actionable output** — The diagnostic payoff. Opens with a Playfair
Display narrative sentence stating how many prompt instructions were derived
and what estimated percentage of denials they address (use the real calculated
percentage from Phase 3, not a generic number). Then the top 3 most impactful
improvements as numbered items, each with an amber number, bold title, and
one-line description. This section bridges the analysis and the full prompt
that follows.
6. **Your most-used phrases** — Grid of chips (2-col mobile, 3-col desktop). Each
chip: monospace quoted phrase on the left, frequency count on the right. White
bg, slate-200 border, rounded-12px. Show 9-12 of the most recurring phrases
found. These should be the reviewer's actual words — their verbal fingerprint.
7. **The corrective prompt** — Dark panel (slate-900 bg, white text, rounded-3xl,
shadow-xl). Opens with a Playfair intro sentence about the instructions. Then
a dark code block (slate-800/80 bg, amber-200 monospace text) containing the
full numbered prompt instructions from Phase 3. Include a copy-to-clipboard
button that works (JS included). Below the code block: a gradient glow card
(indigo-to-purple blurred halo behind a white card) with a closing message
that these instructions are personal — derived from the user's own feedback,
their own language, their own standards.
### Adaptation Rules
- If the user has < 3 months of data, reduce the evolution section to fewer cards
- If most denied files lack feedback below the `---` (bare denials with no
annotations), note this in the narrative — the analysis will be thinner
- **Claude Code fallback mode:** Explicitly label the report source as Claude Code
`ExitPlanMode` denial reasons. Do not fabricate Plannotator-only fields such as
annotation counts or approved-plan line counts. See the fallback reference for
KPI substitutes and footer/provenance guidance.
- If fewer than 5 denial categories emerge, combine the taxonomy and patterns
sections into one
- If the dataset is very small (< 20 files), the narrative should acknowledge the
limited sample size and frame findings as preliminary
- The number of prompt instructions will vary per user — could be 8 or 20. Don't
force exactly 17. Let the data determine the count.
- The top 3 actionable items in section 5 must be the 3 that cover the largest
share of denials, not the 3 that sound most impressive
### Key Rules
1. Every number must come from the real analysis — no fabricated data
2. Every quote must be a real quote from a real file
3. The taxonomy percentages must be calculated from real counts
4. The prompt instructions must trace back to actual denial patterns
5. The copy button on the prompt block must work (include the JS)
After generating, open the file in the user's browser.
## Phase 5: Summary
Tell the user:
- How many denied files were analyzed
- If incremental: how many were new since the last report
- The top 3 denial patterns found
- The estimated percentage of denials the prompt instructions would address
- The single most impactful prompt improvement
- Where the report was saved (including version number)
- If incremental: remind the user that earlier findings are in the previous report
**Claude Code fallback mode:** Adapt the summary per the fallback reference —
report human denial reasons analyzed and total `ExitPlanMode` attempts scanned
instead of Plannotator file counts.
## Phase 6: Improvement Hook
After presenting the summary, ask the user if they want to enable an **improvement
hook** — this takes the corrective prompt instructions from section 7 of the report
and writes them to a file that Plannotator's `EnterPlanMode` hook can inject into
every future planning session automatically.
> "Would you like to enable the improvement hook? This will save the corrective
> prompt instructions to a file that gets automatically injected into all future
> planning sessions — so Claude sees your feedback patterns before writing any plan."
**If yes:**
The hook file lives at:
```
~/.plannotator/hooks/compound/enterplanmode-improve-hook.txt
```
Create the `~/.plannotator/hooks/compound/` directory if it doesn't exist.
The file contents should be the corrective prompt instructions from Phase 3 —
the same numbered list that appears in section 7 of the HTML report. Write them
as plain text, one instruction per line, prefixed with their number. No HTML, no
markdown fences, no preamble — just the instructions themselves. The hook system
will inject this file's contents as-is into the planning context.
**If the file already exists:**
Read the existing file and present the user with a choice:
> "An improvement hook already exists from a previous analysis. I can:
>
> 1. **Replace** — Overwrite with the new instructions (the old ones are gone)
> 2. **Merge** — Combine both, deduplicating overlapping instructions and
> keeping the best version of each
> 3. **Keep existing** — Leave the current hook as-is, skip this step
>
> Which would you prefer?"
- **Replace:** Overwrite the file with the new instructions.
- **Merge:** Read the existing instructions, compare with the new ones, and
produce a merged set. Remove duplicates (same intent even if worded differently).
When two instructions cover the same pattern, keep the more specific or
actionable version. Re-number the final list sequentially. Write the merged
result to the file. Show the user what changed (added N new, removed N
redundant, kept N existing).
- **Keep existing:** Do nothing, move on.
**If no:** Skip this phase entirely.
## Important Notes
- **Data source priority:** Plannotator is the first-class path. Claude Code log
analysis is the secondary path for users without Plannotator archives.
- **Research integrity:** Every file must be read. The value of this analysis comes
from completeness. Sampling or skipping undermines the findings.
- **Real data only:** Never fabricate quotes, percentages, or patterns. If the data
doesn't show a clear pattern, say so honestly rather than inventing one.
- **Let the data lead:** The taxonomy, patterns, and instructions should emerge from
what's actually in the files. Different users will have completely different
denial patterns. A user building mobile apps will have different feedback than
one building APIs. Don't assume what the patterns will be.
- **Agent parallelization:** For large datasets, maximize parallel agents to reduce
wall-clock time. The bottleneck is the largest batch — split it.
- **Structured extraction format:** Ask extraction agents to return structured text
with consistent delimiters so the reduce agent can parse reliably.
- **The report is the artifact:** The HTML dashboard is what the user keeps. It
should be beautiful, honest, and useful. Every section should feel like it was
written about them specifically, because it was.

View File

@@ -0,0 +1,795 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Compound Planning — What 370 Files Reveal</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@0,400;0,500;1,400&family=Inter:wght@300;400;500;600&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
<style>
*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
:root {
--bg: #FDFCFB;
--slate-900: #0f172a;
--slate-800: #1e293b;
--slate-700: #334155;
--slate-600: #475569;
--slate-500: #64748b;
--slate-400: #94a3b8;
--slate-300: #cbd5e1;
--slate-200: #e2e8f0;
--slate-100: #f1f5f9;
--slate-50: #f8fafc;
--amber-500: #f59e0b;
--amber-600: #d97706;
--amber-700: #b45309;
--amber-50: #fffbeb;
--emerald-500: #10b981;
--emerald-600: #059669;
--emerald-400: #34d399;
--emerald-900: #064e3b;
--emerald-800: #065f46;
--emerald-100: #d1fae5;
--emerald-50: #ecfdf5;
--rose-500: #f43f5e;
--rose-600: #e11d48;
--rose-400: #fb7185;
--rose-900: #881337;
--rose-800: #9f1239;
--rose-100: #ffe4e6;
--rose-50: #fff1f2;
--indigo-500: #6366f1;
--indigo-600: #4f46e5;
--purple-600: #9333ea;
}
body {
font-family: 'Inter', ui-sans-serif, system-ui, sans-serif;
background: var(--bg);
color: var(--slate-800);
-webkit-font-smoothing: antialiased;
}
.container {
max-width: 1024px;
margin: 0 auto;
padding: 48px 24px 64px;
}
@media (min-width: 768px) { .container { padding: 96px 24px 80px; } }
/* Typography */
.font-serif { font-family: 'Playfair Display', ui-serif, Georgia, serif; }
.font-mono { font-family: 'JetBrains Mono', ui-monospace, monospace; }
/* Header */
header {
border-bottom: 1px solid var(--slate-200);
padding-bottom: 40px;
margin-bottom: 96px;
display: flex;
justify-content: space-between;
align-items: flex-end;
flex-wrap: wrap;
gap: 16px;
}
header h1 {
font-family: 'Playfair Display', serif;
font-size: 36px;
font-weight: 400;
color: var(--slate-900);
line-height: 1.2;
}
header .meta {
font-size: 15px;
font-weight: 300;
color: var(--slate-500);
letter-spacing: 0.04em;
}
/* Sections */
.section { margin-bottom: 128px; }
.section-label {
font-size: 12px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.2em;
color: var(--slate-400);
margin-bottom: 24px;
}
/* Narrative + KPIs */
.summary {
display: grid;
grid-template-columns: 1fr;
gap: 48px;
align-items: start;
}
@media (min-width: 768px) {
.summary { grid-template-columns: 1fr 240px; }
}
.narrative {
font-family: 'Playfair Display', serif;
font-size: 26px;
line-height: 1.45;
color: var(--slate-900);
}
.narrative .highlight {
background: var(--amber-50);
color: var(--amber-700);
padding: 1px 6px;
border-radius: 3px;
}
.kpi-stack {
display: flex;
flex-direction: column;
gap: 32px;
}
@media (min-width: 768px) {
.kpi-stack { border-left: 1px solid var(--slate-200); padding-left: 32px; }
}
.kpi-item .kpi-value {
font-size: 36px;
font-weight: 300;
color: var(--slate-900);
letter-spacing: -0.02em;
}
.kpi-item .kpi-label {
font-size: 10px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.15em;
color: var(--slate-500);
margin-top: 2px;
}
/* Taxonomy bars */
.taxonomy-list { display: flex; flex-direction: column; gap: 20px; }
.tax-row { display: grid; grid-template-columns: 24px 1fr 52px; gap: 12px; align-items: center; }
.tax-rank {
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: var(--slate-400);
text-align: right;
}
.tax-body { display: flex; flex-direction: column; gap: 6px; }
.tax-label { font-size: 14px; font-weight: 500; color: var(--slate-800); }
.tax-bar-track { height: 4px; background: var(--slate-100); border-radius: 100px; overflow: hidden; }
.tax-bar-fill { height: 100%; border-radius: 100px; transition: width 0.6s ease; }
.tax-bar-fill.top { background: var(--amber-500); }
.tax-bar-fill.rest { background: var(--slate-300); }
.tax-pct {
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: var(--slate-500);
text-align: right;
}
.tax-quote {
font-size: 12px;
font-style: italic;
color: var(--slate-500);
margin-top: 2px;
}
/* Evolution timeline */
.evolution-grid {
display: grid;
grid-template-columns: 1fr;
gap: 24px;
}
@media (min-width: 768px) { .evolution-grid { grid-template-columns: repeat(3, 1fr); } }
.evo-card {
background: white;
border: 1px solid var(--slate-200);
border-radius: 16px;
padding: 28px;
}
.evo-card .evo-month {
font-family: 'Playfair Display', serif;
font-size: 20px;
color: var(--slate-900);
margin-bottom: 4px;
}
.evo-card .evo-theme {
font-size: 12px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.12em;
margin-bottom: 16px;
}
.evo-card .evo-desc {
font-size: 14px;
color: var(--slate-600);
line-height: 1.6;
}
.evo-card .evo-stat {
margin-top: 16px;
padding-top: 16px;
border-top: 1px solid var(--slate-100);
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: var(--slate-500);
}
.evo-jan .evo-theme { color: var(--slate-600); }
.evo-feb .evo-theme { color: var(--amber-600); }
.evo-mar .evo-theme { color: var(--indigo-600); }
/* Quality comparison */
.quality-grid {
display: grid;
grid-template-columns: 1fr;
gap: 24px;
}
@media (min-width: 768px) { .quality-grid { grid-template-columns: 1fr 1fr; } }
.q-card {
border-radius: 24px;
padding: 36px;
}
.q-card.good {
background: color-mix(in srgb, var(--emerald-50) 50%, transparent);
border: 1px solid var(--emerald-100);
}
.q-card.bad {
background: color-mix(in srgb, var(--rose-50) 50%, transparent);
border: 1px solid var(--rose-100);
}
.q-card .q-icon { font-size: 20px; margin-bottom: 12px; }
.q-card .q-title {
font-family: 'Playfair Display', serif;
font-size: 22px;
margin-bottom: 20px;
}
.q-card.good .q-title { color: var(--emerald-900); }
.q-card.bad .q-title { color: var(--rose-900); }
.q-list { list-style: none; display: flex; flex-direction: column; gap: 14px; }
.q-list li {
display: flex;
align-items: flex-start;
gap: 10px;
font-size: 14px;
line-height: 1.6;
}
.q-card.good .q-list li { color: color-mix(in srgb, var(--emerald-800) 90%, transparent); }
.q-card.bad .q-list li { color: color-mix(in srgb, var(--rose-800) 90%, transparent); }
.q-dot {
width: 6px;
height: 6px;
border-radius: 50%;
flex-shrink: 0;
margin-top: 7px;
}
.q-card.good .q-dot { background: var(--emerald-400); }
.q-card.bad .q-dot { background: var(--rose-400); }
/* Phrases */
.phrases-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 12px;
}
@media (min-width: 768px) { .phrases-grid { grid-template-columns: repeat(3, 1fr); } }
.phrase-chip {
background: white;
border: 1px solid var(--slate-200);
border-radius: 12px;
padding: 14px 16px;
display: flex;
justify-content: space-between;
align-items: center;
gap: 8px;
}
.phrase-text {
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: var(--slate-700);
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.phrase-count {
font-family: 'JetBrains Mono', monospace;
font-size: 11px;
color: var(--slate-400);
flex-shrink: 0;
}
/* Dark action panel */
.action-panel {
background: var(--slate-900);
color: white;
border-radius: 24px;
padding: 40px;
box-shadow: 0 20px 25px -5px rgb(0 0 0 / 0.1), 0 8px 10px -6px rgb(0 0 0 / 0.1);
}
@media (min-width: 768px) { .action-panel { padding: 56px; } }
.action-panel .section-label { color: var(--slate-500); }
.action-panel .ap-intro {
font-family: 'Playfair Display', serif;
font-size: 22px;
color: white;
line-height: 1.4;
margin-bottom: 32px;
max-width: 640px;
}
.prompt-block {
background: color-mix(in srgb, var(--slate-800) 80%, transparent);
border: 1px solid color-mix(in srgb, var(--slate-700) 50%, transparent);
border-radius: 16px;
overflow: hidden;
}
.prompt-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 12px 20px;
border-bottom: 1px solid color-mix(in srgb, var(--slate-700) 30%, transparent);
}
.prompt-header-label {
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: var(--slate-400);
display: flex;
align-items: center;
gap: 8px;
}
.prompt-header-label svg { width: 14px; height: 14px; }
.copy-btn {
background: none;
border: none;
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: var(--slate-400);
cursor: pointer;
display: flex;
align-items: center;
gap: 6px;
transition: color 0.2s;
}
.copy-btn:hover { color: white; }
.copy-btn.copied { color: var(--emerald-400); }
.prompt-body {
padding: 20px;
max-height: 480px;
overflow-y: auto;
}
.prompt-body pre {
font-family: 'JetBrains Mono', monospace;
font-size: 13px;
line-height: 1.7;
color: color-mix(in srgb, var(--amber-200) 90%, transparent);
white-space: pre-wrap;
word-break: break-word;
}
.prompt-body pre .comment {
color: var(--slate-500);
}
/* Glow card */
.glow-wrap {
position: relative;
margin-top: 48px;
}
.glow-bg {
position: absolute;
inset: -2px;
background: linear-gradient(135deg, var(--indigo-500), var(--purple-600));
border-radius: 26px;
opacity: 0.15;
filter: blur(16px);
transition: opacity 0.5s;
}
.glow-wrap:hover .glow-bg { opacity: 0.25; }
.glow-card {
position: relative;
background: white;
border: 1px solid var(--slate-200);
border-radius: 24px;
padding: 32px 36px;
display: flex;
justify-content: space-between;
align-items: center;
flex-wrap: wrap;
gap: 20px;
}
.glow-card .gc-text {
font-family: 'Playfair Display', serif;
font-size: 18px;
font-weight: 500;
color: var(--slate-900);
line-height: 1.5;
max-width: 640px;
}
.glow-card .gc-text em {
font-style: italic;
color: var(--indigo-600);
}
/* Footer */
footer {
border-top: 1px solid var(--slate-200);
padding-top: 48px;
margin-top: 0;
text-align: center;
}
footer p {
font-family: 'Playfair Display', serif;
font-style: italic;
font-size: 15px;
color: var(--slate-400);
}
/* Scrollbar in dark code block */
.prompt-body::-webkit-scrollbar { width: 6px; }
.prompt-body::-webkit-scrollbar-track { background: transparent; }
.prompt-body::-webkit-scrollbar-thumb { background: var(--slate-700); border-radius: 3px; }
</style>
</head>
<body>
<div class="container">
<header>
<div>
<h1>What 370 Files Reveal About<br>How You Plan</h1>
<div class="meta" style="margin-top: 8px;">backnotprop/plannotator &middot; Jan 7 &ndash; Mar 18, 2026</div>
</div>
<div class="meta" style="text-align: right;">
<span class="font-mono" style="font-size: 12px;">202 denials &middot; 168 annotations &middot; 71 days</span>
</div>
</header>
<!-- 1. Narrative + KPIs -->
<div class="section">
<div class="section-label">1. The story in the data</div>
<div class="summary">
<div class="narrative">
Across 71 days you denied or revised <span class="highlight">202 plans</span> before any code was written. The single most common reason&mdash;appearing in 1 out of 4 denials&mdash;was the same: the agent jumped to implementation without telling you <em>what</em> it was building, <em>why</em>, or <em>how</em>. Missing narrative. Missing context. Missing the story. Your expectations evolved from &ldquo;does it work?&rdquo; in January to &ldquo;tell me the story and be confident&rdquo; by March.
</div>
<div class="kpi-stack">
<div class="kpi-item">
<div class="kpi-value">25.7%</div>
<div class="kpi-label">Denials for missing narrative</div>
</div>
<div class="kpi-item">
<div class="kpi-value">50%</div>
<div class="kpi-label">Plans revised before coding</div>
</div>
<div class="kpi-item">
<div class="kpi-value">12</div>
<div class="kpi-label">Distinct denial categories</div>
</div>
</div>
</div>
</div>
<!-- 2. Denial Taxonomy -->
<div class="section">
<div class="section-label">2. Why plans get denied</div>
<div class="taxonomy-list">
<div class="tax-row">
<span class="tax-rank">1</span>
<div class="tax-body">
<span class="tax-label">Missing Narrative / Overview</span>
<div class="tax-bar-track"><div class="tax-bar-fill top" style="width: 100%"></div></div>
<span class="tax-quote">"This plan is denied without narrative detail and rationales."</span>
</div>
<span class="tax-pct">25.7%</span>
</div>
<div class="tax-row">
<span class="tax-rank">2</span>
<div class="tax-body">
<span class="tax-label">Clarification Needed</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 65%"></div></div>
<span class="tax-quote">"What does this Mean???"</span>
</div>
<span class="tax-pct">16.8%</span>
</div>
<div class="tax-row">
<span class="tax-rank">3</span>
<div class="tax-body">
<span class="tax-label">Testing / Procedural</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 54%"></div></div>
<span class="tax-quote">"I'm denying so you can create a diff."</span>
</div>
<span class="tax-pct">13.9%</span>
</div>
<div class="tax-row">
<span class="tax-rank">4</span>
<div class="tax-body">
<span class="tax-label">Wrong Approach / Over-Engineered</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 37%"></div></div>
<span class="tax-quote">"Why are we doing difficult shit here? I want a hover experience."</span>
</div>
<span class="tax-pct">9.4%</span>
</div>
<div class="tax-row">
<span class="tax-rank">5</span>
<div class="tax-body">
<span class="tax-label">Process Requirement</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 31%"></div></div>
<span class="tax-quote">"Make sure you feature branch."</span>
</div>
<span class="tax-pct">7.9%</span>
</div>
<div class="tax-row">
<span class="tax-rank">6</span>
<div class="tax-body">
<span class="tax-label">Confidence / Risk Check</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 29%"></div></div>
<span class="tax-quote">"Take a step back, breathe, make sure we're not being irrational."</span>
</div>
<span class="tax-pct">7.4%</span>
</div>
<div class="tax-row">
<span class="tax-rank">7</span>
<div class="tax-body">
<span class="tax-label">Content Removal</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 27%"></div></div>
<span class="tax-quote">"I don't want this in the plan."</span>
</div>
<span class="tax-pct">6.9%</span>
</div>
<div class="tax-row">
<span class="tax-rank">8</span>
<div class="tax-body">
<span class="tax-label">Implementation Bug Found</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 23%"></div></div>
</div>
<span class="tax-pct">5.9%</span>
</div>
<div class="tax-row">
<span class="tax-rank">9</span>
<div class="tax-body">
<span class="tax-label">Design / UX Issue</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 21%"></div></div>
</div>
<span class="tax-pct">5.4%</span>
</div>
<div class="tax-row">
<span class="tax-rank">10</span>
<div class="tax-body">
<span class="tax-label">Naming / Terminology</span>
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 16%"></div></div>
<span class="tax-quote">"Why do you keep calling it Simplified????"</span>
</div>
<span class="tax-pct">4.0%</span>
</div>
</div>
</div>
<!-- 3. Evolution -->
<div class="section">
<div class="section-label">3. How your expectations evolved</div>
<div class="evolution-grid">
<div class="evo-card evo-jan">
<div class="evo-month">January</div>
<div class="evo-theme">"Does it work?"</div>
<div class="evo-desc">Bug-hunting phase. You were hands-on testing View Logs, iterating on session scoping heuristics. 60% of denials were implementation bugs and verification failures. No mention of &ldquo;narrative&rdquo; or &ldquo;overview&rdquo; yet.</div>
<div class="evo-stat">26 denials &middot; 0 narrative requests</div>
</div>
<div class="evo-card evo-feb">
<div class="evo-month">February</div>
<div class="evo-theme">"Follow the process"</div>
<div class="evo-desc">Process gates emerged: feature branches, Linear tickets, pull main. 40% of denials were procedural (diff testing). UX polish intensified. The first narrative demands appeared: &ldquo;I want a narrative under each section.&rdquo;</div>
<div class="evo-stat">48 denials &middot; 6 narrative requests</div>
</div>
<div class="evo-card evo-mar">
<div class="evo-month">March</div>
<div class="evo-theme">"Tell me the story"</div>
<div class="evo-desc">Narrative became the #1 gate. You created a &ldquo;Missing overview&rdquo; label and applied it systematically. Confidence checks became standard. You began telling agents to &ldquo;take a step back, breathe, and analyze.&rdquo;</div>
<div class="evo-stat">128 denials &middot; 25+ narrative requests</div>
</div>
</div>
</div>
<!-- 4. Quality comparison -->
<div class="section">
<div class="section-label">4. What works vs. what doesn't</div>
<div class="quality-grid">
<div class="q-card good">
<div class="q-icon">&#10003;</div>
<div class="q-title">What approved plans do</div>
<ul class="q-list">
<li><span class="q-dot"></span>Lead with a narrative overview: what exists, what changes, why</li>
<li><span class="q-dot"></span>State confidence and identify risks proactively</li>
<li><span class="q-dot"></span>Reference existing codebase patterns before proposing new code</li>
<li><span class="q-dot"></span>Use explicit, transparent naming (not euphemisms)</li>
<li><span class="q-dot"></span>Break large work into phases with evaluation gates</li>
<li><span class="q-dot"></span>Include example output for user-facing changes</li>
<li><span class="q-dot"></span>Specify feature branch and ticket creation steps</li>
</ul>
</div>
<div class="q-card bad">
<div class="q-icon">&#10007;</div>
<div class="q-title">What agents keep getting wrong</div>
<ul class="q-list">
<li><span class="q-dot"></span>Jump to implementation steps without narrative context</li>
<li><span class="q-dot"></span>Over-engineer: Shift+Click when hover works, MCP tool when a README suffices</li>
<li><span class="q-dot"></span>Introduce new code for things the codebase already solves</li>
<li><span class="q-dot"></span>Propose work on top of failing lint/type checks</li>
<li><span class="q-dot"></span>Use vague or euphemistic naming (&ldquo;Accept&rdquo; instead of &ldquo;Git Add&rdquo;)</li>
<li><span class="q-dot"></span>Wait to be asked for confidence instead of stating it</li>
<li><span class="q-dot"></span>Rush to modify instead of reporting what they see</li>
</ul>
</div>
</div>
</div>
<!-- 5. The actionable output -->
<div class="section">
<div class="section-label">5. The actionable output</div>
<div class="narrative" style="margin-bottom: 32px;">
The analysis produced <span class="highlight">17 specific prompt instructions</span> that, if embedded in a planning prompt, would address ~70% of all denial reasons. The biggest three:
</div>
<div style="display: flex; flex-direction: column; gap: 20px;">
<div style="display: flex; gap: 16px; align-items: flex-start;">
<span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">1</span>
<div>
<div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">Every plan MUST start with a Solution Overview</div>
<div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">What exists, what changes, why, how. This alone addresses 1 in 4 denials.</div>
</div>
</div>
<div style="display: flex; gap: 16px; align-items: flex-start;">
<span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">2</span>
<div>
<div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">End every plan with a Confidence Assessment</div>
<div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">Don&rsquo;t wait to be asked. State your confidence, identify risks, flag uncertainties.</div>
</div>
</div>
<div style="display: flex; gap: 16px; align-items: flex-start;">
<span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">3</span>
<div>
<div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">Search for existing patterns before proposing new code</div>
<div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">Explicitly state what you found in the codebase. Prefer reuse over new implementation.</div>
</div>
</div>
</div>
</div>
<!-- 6. Recurring phrases -->
<div class="section">
<div class="section-label">6. Your most-used phrases</div>
<div class="phrases-grid">
<div class="phrase-chip"><span class="phrase-text">"narrative"</span><span class="phrase-count">50+</span></div>
<div class="phrase-chip"><span class="phrase-text">"I don't want this in the plan"</span><span class="phrase-count">10</span></div>
<div class="phrase-chip"><span class="phrase-text">"feature branch"</span><span class="phrase-count">8+</span></div>
<div class="phrase-chip"><span class="phrase-text">"confidence"</span><span class="phrase-count">8+</span></div>
<div class="phrase-chip"><span class="phrase-text">"Missing overview"</span><span class="phrase-count">14</span></div>
<div class="phrase-chip"><span class="phrase-text">"front-end design skill"</span><span class="phrase-count">16</span></div>
<div class="phrase-chip"><span class="phrase-text">"separation of concerns"</span><span class="phrase-count">6</span></div>
<div class="phrase-chip"><span class="phrase-text">"Take a step back, breathe"</span><span class="phrase-count">6</span></div>
<div class="phrase-chip"><span class="phrase-text">"how does this work"</span><span class="phrase-count">5+</span></div>
<div class="phrase-chip"><span class="phrase-text">"what the fuck"</span><span class="phrase-count">4</span></div>
<div class="phrase-chip"><span class="phrase-text">"create a ticket"</span><span class="phrase-count">4+</span></div>
<div class="phrase-chip"><span class="phrase-text">"reusable"</span><span class="phrase-count">19+</span></div>
</div>
</div>
<!-- 7. Corrective Prompt -->
<div class="section" style="margin-bottom: 64px;">
<div class="action-panel">
<div class="section-label">7. The corrective prompt</div>
<div class="ap-intro">
These 17 instructions were extracted directly from your denial patterns. Embedding them in a planning prompt would address approximately 70% of all denial reasons.
</div>
<div class="prompt-block">
<div class="prompt-header">
<span class="prompt-header-label">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="4 17 10 11 4 5"></polyline><line x1="12" y1="19" x2="20" y2="19"></line></svg>
planning-instructions.md
</span>
<button class="copy-btn" onclick="copyPrompt(this)">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg>
Copy
</button>
</div>
<div class="prompt-body">
<pre id="prompt-content"><span class="comment"># Planning Instructions
# Derived from 370 files of denial & annotation analysis</span>
1. STRUCTURE: Every plan MUST begin with a "Solution Overview"
containing 2-3 paragraphs of narrative prose explaining:
- What exists today (current state)
- What will change and why
- How it will be built (approach summary)
Do NOT skip this. Do NOT replace it with bullet points.
2. NARRATIVE: Every major section must include a rationale
paragraph — not just what will be done, but WHY this
approach was chosen over alternatives.
3. FEATURE BRANCH: Always specify implementation will occur
on a feature branch. State the branch name. Never plan
to work directly on main.
4. EXISTING PATTERNS: Before proposing any new implementation,
search the codebase for existing patterns that solve the
same problem. Explicitly state what you found and whether
you will reuse it. Prefer reuse over new code.
5. CONFIDENCE STATEMENT: End the plan with a "Confidence
Assessment" section. State your confidence level, identify
risks or edge cases, and note uncertainties. Do not wait
to be asked.
6. PHASING: For plans with more than 3 steps, break them into
numbered phases. After each phase, note "Pause for
evaluation" so the reviewer can assess before proceeding.
7. ISSUE TRACKING: If the project uses Linear or GitHub Issues,
include a step to create relevant tickets BEFORE
implementation. Backlog items should be separate tickets.
8. SIMPLICITY: Choose the simplest approach that meets
requirements. Do not introduce modifier keys when hover
works. Do not build a framework when a README suffices.
9. NAMING: Use explicit, transparent names for user-facing
features. Do not euphemize Git operations ("Git Add"
not "Accept"). Match existing product naming conventions.
10. CODE QUALITY: State that implementation will follow clean
code principles: modular architecture, separation of
concerns, no circumventing lint or type checks.
11. CLEAN FOUNDATION: If the codebase has failing lint or type
checks, address these BEFORE proposing new features. State
the current CI/CD state.
12. PRIVACY: For features involving data storage or sharing,
explicitly state privacy guarantees. Require user
confirmation before storing data.
13. EXAMPLES: When the plan involves user-facing output or UI,
include an example of what it will look like.
14. FOCUSED SCOPE: Do not include sections that are obvious,
boilerplate, or previously asked to be removed. Keep the
plan focused rather than comprehensive.
15. DESIGN SKILL: For any frontend/UI work, invoke the
front-end design skill to validate the approach. Note
this invocation explicitly in the plan.
16. VERIFICATION STEP: For refactors or multi-file changes,
include a verification step with line-by-line comparison
of affected code paths.
17. DELIBERATION: If the plan involves a dramatic shift, state
that you have re-evaluated the approach, traced through
affected files mentally, and are confident in the plan.
Do not rush.</pre>
</div>
</div>
<div class="glow-wrap">
<div class="glow-bg"></div>
<div class="glow-card">
<div class="gc-text">
These instructions are yours &mdash; derived from <em>your feedback, your language, your standards</em>. Copy them into your planning prompt and watch the deny rate drop.
</div>
</div>
</div>
</div>
</div>
<footer>
<p>Analysis of 202 denied plans and 168 annotation files from the Plannotator archive.</p>
</footer>
</div>
<script>
function copyPrompt(btn) {
const text = document.getElementById('prompt-content').textContent;
navigator.clipboard.writeText(text).then(() => {
btn.innerHTML = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M22 11.08V12a10 10 0 1 1-5.93-9.14"></path><polyline points="22 4 12 14.01 9 11.01"></polyline></svg> Copied';
btn.classList.add('copied');
setTimeout(() => {
btn.innerHTML = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg> Copy';
btn.classList.remove('copied');
}, 2000);
});
}
</script>
</body>
</html>

View File

@@ -0,0 +1,282 @@
# Claude Code Fallback
Read this file only when the user does **not** have a usable Plannotator archive.
This is the secondary path for ordinary Claude Code users whose denial history
exists in `~/.claude/projects/` rather than `~/.plannotator/plans/`.
The goal is the same as the main skill:
- extract the user's real denial reasons
- reduce them into a taxonomy and prompt corrections
- produce the same HTML report design and section flow
## Source of Truth
Use the bundled parser at:
- [scripts/extract_exit_plan_mode_outcomes.py](../scripts/extract_exit_plan_mode_outcomes.py)
Resolve that script path relative to this skill directory before running it.
This script normalizes `ExitPlanMode` outcomes from Claude Code JSONL transcripts
and emits clean JSON parts containing only human-authored denial reasons by default.
Do **not** read raw `~/.claude/projects/**/*.jsonl` directly unless:
- the parser fails
- the user asks for audit-level verification
- you need to inspect one or two suspicious records by hand
The parser exists specifically to strip transcript noise such as generic native
reject strings and wrapper boilerplate.
## Run the Parser
Create the working directory first:
```bash
mkdir -p /tmp/compound-planning
```
Then run the bundled parser. Prefer `python3`; if unavailable, use `python`.
Use a resolved absolute script path, not a repo-local copy.
```bash
python3 [RESOLVED SKILL PATH]/scripts/extract_exit_plan_mode_outcomes.py \
--projects-dir ~/.claude/projects \
--json-out /tmp/compound-planning/claude-code-human-reasons.json \
--show-samples 0
```
Expected output:
- manifest:
`/tmp/compound-planning/claude-code-human-reasons/claude-code-human-reasons.manifest.json`
- part files:
`/tmp/compound-planning/claude-code-human-reasons/claude-code-human-reasons.part-XXXX-of-XXXX.json`
The script prints how many records were detected and how many JSON part files were emitted.
## What To Read First
Read the manifest before reading any part file.
The manifest gives you:
- total filtered record count
- total `ExitPlanMode` attempts
- native approval / denial counts
- non-native denial counts
- part file list
Use the part files only after you understand the overall dataset shape.
## Inventory In Fallback Mode
In Claude Code fallback mode, report this dataset instead of the Plannotator file counts:
- human denial reasons found
- total `ExitPlanMode` attempts scanned
- native approvals
- native denials with extractable inline reason
- native denials without recoverable reason
- non-native denials with recoverable payload
- number of emitted JSON parts
- date range from the records
- total days spanned
- distinct sessions
- distinct project roots / `cwd` values
Also calculate:
- average `plan_length_chars` where present
- percentage of all denials that contain a recoverable human reason
Do **not** fabricate Plannotator-only inventory fields in fallback mode:
- no `*-approved.md` counts
- no `*.annotations.md` counts
- no `*.diff.md` counts
- no approved-plan line-count analysis
If the user asks for those specifically, state that Claude Code log fallback mode
does not contain those artifacts.
### Previous Report Detection In Fallback Mode
Previous report detection still applies. Check the user's home directory or
`~/.plannotator/plans/` for existing `compound-planning-report*.html` files. If
found, offer the same incremental vs full choice as Plannotator mode. In
incremental mode, filter the parser output by timestamp rather than by filename
date — use the `timestamp` field in each JSON record.
If no previous report exists, use the first-report naming convention
(`compound-planning-report.html`). Otherwise use the next version number.
## Extraction In Fallback Mode
Treat the emitted JSON part files as the clean source dataset.
### Batching
- **Small datasets (< 200 records):** read the part files directly without extra agents
- **Medium datasets (200-800 records):** split by part file or time range into 2-4 agents
- **Large datasets (800+ records):** split by part file groups or balanced time ranges
All extraction agents should use `model: "haiku"` — they're doing straightforward
file reading and structured extraction, not reasoning.
Each extraction agent should read every record in its assigned part files and write
clean markdown output to:
```text
/tmp/compound-planning/extraction-{batch-name}.md
```
### Extraction Prompt For Claude Code Denial Records
Use this prompt for each fallback extraction batch (adapt the part files and output path):
```text
You are extracting structured data from Claude Code ExitPlanMode denial records.
Files to read: [JSON PART FILES]
Output: Write your complete results to [OUTPUT FILE PATH]
Read EVERY record in the assigned files. Each record already contains a cleaned
human_reason field. Use that as the primary source text.
For EACH record, extract:
- Date
- Session ID
- Project / cwd
- Topic (only if inferable from the reason or plan path; otherwise say "Unknown from logs")
- Human denial reason
- What was specifically asked to change
- Feedback type (let the content determine the category)
- Notable phrases
- Reason source (`native_inline_reason`, `non_native_freeform_payload`, or `structured_quote_extraction`)
- Plan path if present
- Plan length in chars if present
Do NOT skip any records. One entry per record.
Format each entry as:
**[session_id :: tool_use_id]**
- Date: ...
- Project: ...
- Topic: ...
- Human denial reason: ...
- Feedback type: ...
- Specific asks: ...
- Notable phrases: ...
- Reason source: ...
- Plan path: ...
- Plan length chars: ...
---
After processing all records, write the complete results to [OUTPUT FILE PATH].
State the total record count at the end of the file.
```
## Reduction In Fallback Mode
The reduction step stays conceptually the same:
- taxonomy
- top patterns
- recurring phrases
- reviewer values
- recurring agent mistakes
- structural requests
- evolution over time
- corrective prompt instructions
Use `model: "sonnet"` for reduction agents, same as Plannotator mode. The
two-stage reduce (partial reduces for 21+ extraction files) also applies when
there are many part files.
But interpret the dataset correctly:
- this is denial-reason evidence from Claude Code logs
- not every denial has a recoverable human reason
- annotations may be absent entirely
- success traits are often inferred from the inverse of repeated denial feedback
If the evidence for "what works" is weaker than the evidence for "what fails",
say that explicitly.
## HTML Report Adaptation
Use the same template and the same section order as the main skill.
In fallback mode:
- explicitly state in the header/meta that the source is Claude Code `ExitPlanMode`
denial reasons
- keep the same narrative-first editorial style
- keep the same 7 major sections
- use real denial-reason counts, dates, phrases, and percentages only
### KPI Sidebar Substitutes
The Plannotator version uses a revision-rate KPI that may not exist here.
In fallback mode, prefer this KPI trio:
1. top denial category percentage
2. total human denial reasons recovered
3. number of distinct denial categories
If a better third metric emerges from the data, use it, but do not invent one.
### Footer / Provenance
The footer tagline should mention that the report was derived from Claude Code
denial reasons rather than Plannotator markdown archives.
### Important Limitation To State
If `human_reasons_total < total denials`, mention in the narrative or footer note
that some denials in the transcript did not contain recoverable human-authored
feedback and therefore could not contribute to the pattern analysis.
### Versioned Report Naming
Versioned naming (`v2`, `v3`, etc.) applies to fallback mode too. Save reports
to `~/.plannotator/plans/` (create the directory if it doesn't exist) so that
all compound planning reports live in the same location regardless of data source.
## Summary In Fallback Mode
At the end, tell the user:
- how many human denial reasons were analyzed
- how many total `ExitPlanMode` attempts were scanned
- the top 3 denial patterns found
- the estimated percentage of denial reasons the corrective instructions address
- the single most impactful prompt improvement
- where the report was saved (including version number)
- if incremental: note that earlier findings are in the previous report
## Improvement Hook In Fallback Mode
The Phase 6 improvement hook applies to fallback mode too. The corrective prompt
instructions derived from Claude Code denial reasons are just as useful for
injection into future planning sessions. Follow the same flow as the main skill.
## Audit Mode
Only if the user asks for raw denial records or transcript noise:
```bash
python3 [RESOLVED SKILL PATH]/scripts/extract_exit_plan_mode_outcomes.py \
--projects-dir ~/.claude/projects \
--records-filter denials \
--json-out /tmp/compound-planning/claude-code-all-denials.json \
--show-samples 0
```
Do not use this audit-mode output for the normal report unless the user asks for it.

View File

@@ -0,0 +1,820 @@
#!/usr/bin/env python3
"""Extract ExitPlanMode outcomes from Claude Code JSONL session logs.
This parser keeps three views of the same data:
1. Strict native Claude Code classification
- native approval:
"User has approved your plan."
- native denial:
"The user doesn't want to proceed with this tool use. The tool use was rejected"
2. General denial capture
- any matching ExitPlanMode tool_result with is_error=true and non-empty text
is captured as a denial/error payload, even when it is custom hook output
or some other non-native integration.
3. Human-reason extraction
- native inline reasons are preserved as-is
- freeform non-native error payloads are treated as human reasons
- structured non-native payloads are reduced to quoted feedback where possible
This means the script does not depend on hook-specific strings to capture custom
denials, but it also does not dump wrapper boilerplate into the human-reason
output.
The script streams JSONL line-by-line and uses only the Python standard library.
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Dict, Iterable, Iterator, List, Optional, Tuple
APPROVE_PREFIX = "User has approved your plan."
REJECT_PREFIX = (
"The user doesn't want to proceed with this tool use. "
"The tool use was rejected"
)
REASON_MARKER = "To tell you how to proceed, the user said:\n"
NOTE_MARKER = (
"\n\nNote: The user's next message may contain a correction or preference."
)
@dataclass
class AttemptRecord:
session_id: str
tool_use_id: str
file_path: str
line_number: int
timestamp: Optional[str]
cwd: Optional[str]
plan_file_path: Optional[str]
plan_length_chars: Optional[int]
outcome: str = "pending"
native_reason: Optional[str] = None
native_reason_style: Optional[str] = None
captured_reason: Optional[str] = None
captured_reason_style: Optional[str] = None
captured_reason_source: Optional[str] = None
human_reason: Optional[str] = None
human_reason_style: Optional[str] = None
human_reason_source: Optional[str] = None
result_is_error: Optional[bool] = None
result_file_path: Optional[str] = None
result_line_number: Optional[int] = None
result_timestamp: Optional[str] = None
result_preview: Optional[str] = None
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Extract ExitPlanMode approvals/denials from Claude Code logs."
)
parser.add_argument(
"--projects-dir",
default="~/.claude/projects",
help="Root Claude projects directory. Default: %(default)s",
)
parser.add_argument(
"--include-subagents",
action="store_true",
help="Include /subagents/ JSONL files. Default is to skip them.",
)
parser.add_argument(
"--records-filter",
choices=("all", "native", "native-denials", "denials", "human-reasons"),
default="human-reasons",
help=(
"Which records to write to JSON/CSV outputs. "
"Default: %(default)s"
),
)
parser.add_argument(
"--include-non-native-denials",
action="store_true",
help=(
"Include non-native denial/error payloads in sample output. "
"Default sample output shows only native denials."
),
)
parser.add_argument(
"--show-samples",
type=int,
default=5,
help="How many denial samples to print in the text summary.",
)
parser.add_argument(
"--json-out",
help="Optional path to write a JSON report.",
)
parser.add_argument(
"--max-output-tokens-per-file",
type=int,
default=50000,
help=(
"Approximate max token budget per JSON file when writing --json-out. "
"Default: %(default)s"
),
)
return parser.parse_args()
def iter_jsonl_files(root: Path, include_subagents: bool) -> Iterator[Path]:
for dirpath, dirnames, filenames in os.walk(root):
if not include_subagents and "subagents" in dirnames:
dirnames.remove("subagents")
dirnames.sort()
for filename in sorted(filenames):
if filename.endswith(".jsonl"):
yield Path(dirpath) / filename
def make_attempt_key(session_id: str, tool_use_id: str) -> str:
return session_id + "::" + tool_use_id
def preview(text: str, limit: int = 220) -> str:
compact = " ".join(text.split())
if len(compact) <= limit:
return compact
return compact[: limit - 3] + "..."
def estimate_tokens(text: str) -> int:
# Rough enough for output chunking. We intentionally bias slightly high.
return max(1, (len(text) + 3) // 4)
def iter_blocks(message_content: object) -> Iterator[dict]:
if not isinstance(message_content, list):
return
for block in message_content:
if isinstance(block, dict):
yield block
def extract_text(content: object) -> str:
if isinstance(content, str):
return content
if not isinstance(content, list):
return ""
parts: List[str] = []
for item in content:
if isinstance(item, str):
parts.append(item)
continue
if not isinstance(item, dict):
continue
if isinstance(item.get("text"), str):
parts.append(item["text"])
elif isinstance(item.get("content"), str):
parts.append(item["content"])
return "\n".join(part for part in parts if part)
def classify_reason_style(reason: Optional[str]) -> Optional[str]:
if not reason:
return None
stripped = reason.lstrip()
if (
stripped.startswith("#")
or stripped.startswith("YOUR PLAN WAS NOT APPROVED.")
or "\n## " in reason
or "\n---" in reason
):
return "structured"
return "freeform"
def extract_blockquote_feedback(text: str) -> List[str]:
quotes: List[str] = []
current: List[str] = []
for raw_line in text.splitlines():
stripped = raw_line.strip()
if stripped.startswith(">"):
current.append(stripped[1:].lstrip())
continue
if current:
if not stripped or stripped.startswith("## ") or stripped == "---":
quote = "\n".join(line for line in current if line).strip()
if quote:
quotes.append(quote)
current = []
continue
# Preserve wrapped continuation lines that belong to the same quote.
current.append(stripped)
if current:
quote = "\n".join(line for line in current if line).strip()
if quote:
quotes.append(quote)
return quotes
def extract_human_reason(
native_reason: Optional[str],
captured_reason: Optional[str],
captured_reason_style: Optional[str],
) -> Tuple[Optional[str], Optional[str], Optional[str]]:
if native_reason:
return (
native_reason,
classify_reason_style(native_reason),
"native_inline_reason",
)
if not captured_reason:
return (None, None, None)
if captured_reason_style == "freeform":
return (
captured_reason,
classify_reason_style(captured_reason),
"non_native_freeform_payload",
)
quote_feedback = extract_blockquote_feedback(captured_reason)
if quote_feedback:
reason = "\n\n".join(quote_feedback)
return (
reason,
classify_reason_style(reason),
"structured_quote_extraction",
)
return (None, None, None)
def classify_result(
text: str,
is_error: bool,
) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
stripped = text.strip()
if not stripped:
if is_error:
return (
"denied_non_native_no_payload",
None,
None,
None,
None,
)
return ("pending", None, None, None, None)
if stripped.startswith(APPROVE_PREFIX):
return ("approved_native", None, None, None, None)
if stripped.startswith(REJECT_PREFIX):
marker_index = stripped.find(REASON_MARKER)
if marker_index < 0:
return ("denied_native_no_reason", None, None, None, None)
reason = stripped[marker_index + len(REASON_MARKER) :]
note_index = reason.find(NOTE_MARKER)
if note_index >= 0:
reason = reason[:note_index]
reason = reason.strip()
if reason:
style = classify_reason_style(reason)
return (
"denied_native_with_reason",
reason,
reason,
"native_inline_reason",
style,
)
return ("denied_native_no_reason", None, None, None, None)
if is_error:
style = classify_reason_style(stripped)
return (
"denied_non_native_with_payload",
None,
stripped,
"non_native_error_payload",
style,
)
return ("non_native_other", None, None, None, None)
def outcome_rank(outcome: str) -> int:
ranks = {
"pending": 0,
"non_native_other": 1,
"approved_native": 2,
"denied_native_no_reason": 3,
"denied_native_with_reason": 4,
"denied_non_native_no_payload": 5,
"denied_non_native_with_payload": 6,
}
return ranks.get(outcome, 0)
def update_attempt_from_result(
attempt: AttemptRecord,
file_path: Path,
line_number: int,
timestamp: Optional[str],
text: str,
is_error: bool,
) -> None:
(
outcome,
native_reason,
captured_reason,
captured_reason_source,
captured_reason_style,
) = classify_result(text=text, is_error=is_error)
if outcome_rank(outcome) < outcome_rank(attempt.outcome):
return
attempt.outcome = outcome
attempt.native_reason = native_reason
attempt.native_reason_style = classify_reason_style(native_reason)
attempt.captured_reason = captured_reason
attempt.captured_reason_source = captured_reason_source
attempt.captured_reason_style = captured_reason_style
(
attempt.human_reason,
attempt.human_reason_style,
attempt.human_reason_source,
) = extract_human_reason(
native_reason=native_reason,
captured_reason=captured_reason,
captured_reason_style=captured_reason_style,
)
attempt.result_is_error = is_error
attempt.result_file_path = str(file_path)
attempt.result_line_number = line_number
attempt.result_timestamp = timestamp
attempt.result_preview = preview(text)
def scan_projects(
projects_dir: Path,
include_subagents: bool,
) -> Tuple[Dict[str, int], List[AttemptRecord]]:
stats = {
"files_scanned": 0,
"lines_scanned": 0,
"json_errors": 0,
}
attempts: Dict[str, AttemptRecord] = {}
for file_path in iter_jsonl_files(projects_dir, include_subagents):
stats["files_scanned"] += 1
try:
handle = file_path.open("r", encoding="utf-8", errors="replace")
except OSError:
continue
with handle:
for line_number, raw_line in enumerate(handle, start=1):
if not raw_line.strip():
continue
stats["lines_scanned"] += 1
try:
obj = json.loads(raw_line)
except json.JSONDecodeError:
stats["json_errors"] += 1
continue
session_id = str(obj.get("sessionId") or str(file_path))
timestamp = obj.get("timestamp")
cwd = obj.get("cwd")
message = obj.get("message")
if not isinstance(message, dict):
continue
content = message.get("content")
for block in iter_blocks(content):
if (
block.get("type") == "tool_use"
and block.get("name") == "ExitPlanMode"
and isinstance(block.get("id"), str)
):
tool_use_id = block["id"]
key = make_attempt_key(session_id, tool_use_id)
if key in attempts:
continue
input_data = block.get("input")
plan = None
plan_file_path = None
if isinstance(input_data, dict):
if isinstance(input_data.get("plan"), str):
plan = input_data["plan"]
if isinstance(input_data.get("planFilePath"), str):
plan_file_path = input_data["planFilePath"]
attempts[key] = AttemptRecord(
session_id=session_id,
tool_use_id=tool_use_id,
file_path=str(file_path),
line_number=line_number,
timestamp=timestamp if isinstance(timestamp, str) else None,
cwd=cwd if isinstance(cwd, str) else None,
plan_file_path=plan_file_path,
plan_length_chars=len(plan) if isinstance(plan, str) else None,
)
if message.get("role") != "user":
continue
for block in iter_blocks(content):
if (
block.get("type") != "tool_result"
or not isinstance(block.get("tool_use_id"), str)
):
continue
key = make_attempt_key(session_id, block["tool_use_id"])
attempt = attempts.get(key)
if attempt is None:
continue
text = extract_text(block.get("content"))
update_attempt_from_result(
attempt=attempt,
file_path=file_path,
line_number=line_number,
timestamp=timestamp if isinstance(timestamp, str) else None,
text=text,
is_error=bool(block.get("is_error")),
)
return stats, list(attempts.values())
def summarize(attempts: Iterable[AttemptRecord]) -> Dict[str, int]:
summary = {
"total_exit_plan_attempts": 0,
"approved_native": 0,
"denied_native_with_reason": 0,
"denied_native_no_reason": 0,
"denied_native_with_freeform_reason": 0,
"denied_native_with_structured_reason": 0,
"denied_non_native_with_payload": 0,
"denied_non_native_no_payload": 0,
"captured_denial_reasons_total": 0,
"captured_freeform_reasons": 0,
"captured_structured_reasons": 0,
"human_reasons_total": 0,
"human_reasons_native": 0,
"human_reasons_non_native": 0,
"human_reasons_freeform": 0,
"human_reasons_structured": 0,
"non_native_other": 0,
"pending": 0,
}
for attempt in attempts:
summary["total_exit_plan_attempts"] += 1
summary[attempt.outcome] = summary.get(attempt.outcome, 0) + 1
if attempt.outcome == "denied_native_with_reason":
if attempt.native_reason_style == "freeform":
summary["denied_native_with_freeform_reason"] += 1
elif attempt.native_reason_style == "structured":
summary["denied_native_with_structured_reason"] += 1
if attempt.captured_reason:
summary["captured_denial_reasons_total"] += 1
if attempt.captured_reason_style == "freeform":
summary["captured_freeform_reasons"] += 1
elif attempt.captured_reason_style == "structured":
summary["captured_structured_reasons"] += 1
if attempt.human_reason:
summary["human_reasons_total"] += 1
if attempt.human_reason_source == "native_inline_reason":
summary["human_reasons_native"] += 1
else:
summary["human_reasons_non_native"] += 1
if attempt.human_reason_style == "freeform":
summary["human_reasons_freeform"] += 1
elif attempt.human_reason_style == "structured":
summary["human_reasons_structured"] += 1
return summary
def filter_records(
attempts: List[AttemptRecord],
records_filter: str,
) -> List[AttemptRecord]:
if records_filter == "all":
return attempts
if records_filter == "native":
return [
attempt
for attempt in attempts
if attempt.outcome.startswith("approved_native")
or attempt.outcome.startswith("denied_native")
]
if records_filter == "native-denials":
return [
attempt
for attempt in attempts
if attempt.outcome.startswith("denied_native")
]
if records_filter == "human-reasons":
return [attempt for attempt in attempts if attempt.human_reason]
return [
attempt
for attempt in attempts
if attempt.outcome.startswith("denied_native")
or attempt.outcome.startswith("denied_non_native")
]
def build_json_chunks(
records: List[AttemptRecord],
max_output_tokens_per_file: int,
) -> List[List[AttemptRecord]]:
if not records:
return [[]]
chunks: List[List[AttemptRecord]] = []
current_chunk: List[AttemptRecord] = []
current_tokens = 0
for record in records:
record_dict = asdict(record)
record_json = json.dumps(record_dict, ensure_ascii=False)
record_tokens = estimate_tokens(record_json)
if current_chunk and current_tokens + record_tokens > max_output_tokens_per_file:
chunks.append(current_chunk)
current_chunk = []
current_tokens = 0
current_chunk.append(record)
current_tokens += record_tokens
if current_chunk:
chunks.append(current_chunk)
return chunks
def print_summary(
projects_dir: Path,
include_subagents: bool,
stats: Dict[str, int],
attempts: List[AttemptRecord],
summary: Dict[str, int],
show_samples: int,
include_non_native_denials: bool,
) -> None:
native_denials = (
summary["denied_native_with_reason"] + summary["denied_native_no_reason"]
)
total_denials = (
native_denials
+ summary["denied_non_native_with_payload"]
+ summary["denied_non_native_no_payload"]
)
native_extractable_ratio = (
(summary["denied_native_with_reason"] / native_denials) * 100.0
if native_denials
else 0.0
)
all_capture_ratio = (
(summary["captured_denial_reasons_total"] / total_denials) * 100.0
if total_denials
else 0.0
)
print(f"Projects dir: {projects_dir}")
print(f"Included subagents: {'yes' if include_subagents else 'no'}")
print(f"JSONL files scanned: {stats['files_scanned']}")
print(f"JSON lines scanned: {stats['lines_scanned']}")
print(f"JSON parse errors: {stats['json_errors']}")
print()
print(f"ExitPlanMode attempts: {summary['total_exit_plan_attempts']}")
print(f"Native approvals: {summary['approved_native']}")
print(
"Native denials with extractable reason: "
f"{summary['denied_native_with_reason']}"
)
print(
"Native denials without reason: "
f"{summary['denied_native_no_reason']}"
)
print(
"Freeform native reasons: "
f"{summary['denied_native_with_freeform_reason']}"
)
print(
"Structured native reasons: "
f"{summary['denied_native_with_structured_reason']}"
)
print(
"Non-native denials with payload: "
f"{summary['denied_non_native_with_payload']}"
)
print(
"Non-native denials without payload: "
f"{summary['denied_non_native_no_payload']}"
)
print(
"Captured denial reasons total: "
f"{summary['captured_denial_reasons_total']}"
)
print(
"Captured freeform reasons: "
f"{summary['captured_freeform_reasons']}"
)
print(
"Captured structured reasons: "
f"{summary['captured_structured_reasons']}"
)
print(f"Human reasons total: {summary['human_reasons_total']}")
print(f"Human reasons from native denials: {summary['human_reasons_native']}")
print(
"Human reasons from non-native denials: "
f"{summary['human_reasons_non_native']}"
)
print(
"Non-native / non-denial outcomes: "
f"{summary['non_native_other']}"
)
print(f"Pending / unmatched attempts: {summary['pending']}")
print()
print(
"Extractable native denial reasons: "
f"{summary['denied_native_with_reason']}/{native_denials} "
f"({native_extractable_ratio:.1f}%)"
)
print(
"Captured denial payloads across all denial types: "
f"{summary['captured_denial_reasons_total']}/{total_denials} "
f"({all_capture_ratio:.1f}%)"
)
print(
"Human reasons across all denial types: "
f"{summary['human_reasons_total']}/{total_denials} "
f"({((summary['human_reasons_total'] / total_denials) * 100.0 if total_denials else 0.0):.1f}%)"
)
if include_non_native_denials:
samples = [attempt for attempt in attempts if attempt.human_reason]
else:
samples = [
attempt
for attempt in attempts
if attempt.outcome == "denied_native_with_reason" and attempt.human_reason
]
samples = samples[: max(show_samples, 0)]
if not samples:
return
print()
print(
"Sample denial reasons:"
if include_non_native_denials
else "Sample native denial reasons:"
)
for attempt in samples:
style = attempt.human_reason_style or "unknown"
source = attempt.human_reason_source or "unknown"
reason = attempt.human_reason or ""
print(
"- "
f"[{attempt.outcome} / {source} / {style}] "
f"{reason!r} "
f"({attempt.file_path}:{attempt.result_line_number})"
)
def write_json_report(
output_path: Path,
projects_dir: Path,
include_subagents: bool,
stats: Dict[str, int],
summary: Dict[str, int],
records: List[AttemptRecord],
max_output_tokens_per_file: int,
) -> List[Path]:
output_path.parent.mkdir(parents=True, exist_ok=True)
chunks = build_json_chunks(records, max_output_tokens_per_file)
base_name = output_path.stem
output_dir = output_path.with_suffix("")
output_dir.mkdir(parents=True, exist_ok=True)
written_files: List[Path] = []
part_summaries = []
for index, chunk in enumerate(chunks, start=1):
chunk_records = [asdict(record) for record in chunk]
chunk_payload = {
"projects_dir": str(projects_dir),
"include_subagents": include_subagents,
"stats": stats,
"summary": summary,
"part_index": index,
"part_count": len(chunks),
"record_count": len(chunk_records),
"records": chunk_records,
}
part_name = f"{base_name}.part-{index:04d}-of-{len(chunks):04d}.json"
part_path = output_dir / part_name
part_path.write_text(
json.dumps(chunk_payload, indent=2, ensure_ascii=False),
encoding="utf-8",
)
written_files.append(part_path)
part_summaries.append(
{
"part_index": index,
"file_name": part_name,
"record_count": len(chunk_records),
}
)
manifest_payload = {
"projects_dir": str(projects_dir),
"include_subagents": include_subagents,
"stats": stats,
"summary": summary,
"records_filter_record_count": len(records),
"part_count": len(chunks),
"max_output_tokens_per_file": max_output_tokens_per_file,
"parts": part_summaries,
}
manifest_path = output_dir / f"{base_name}.manifest.json"
manifest_path.write_text(
json.dumps(manifest_payload, indent=2, ensure_ascii=False),
encoding="utf-8",
)
written_files.insert(0, manifest_path)
return written_files
def main() -> int:
args = parse_args()
projects_dir = Path(args.projects_dir).expanduser()
if not projects_dir.exists():
print(f"Projects dir does not exist: {projects_dir}", file=sys.stderr)
return 1
stats, attempts = scan_projects(
projects_dir=projects_dir,
include_subagents=args.include_subagents,
)
attempts.sort(
key=lambda attempt: (
attempt.file_path,
attempt.line_number,
attempt.tool_use_id,
)
)
summary = summarize(attempts)
records = filter_records(attempts, args.records_filter)
print_summary(
projects_dir=projects_dir,
include_subagents=args.include_subagents,
stats=stats,
attempts=attempts,
summary=summary,
show_samples=args.show_samples,
include_non_native_denials=args.include_non_native_denials,
)
if args.json_out:
written_files = write_json_report(
output_path=Path(args.json_out).expanduser(),
projects_dir=projects_dir,
include_subagents=args.include_subagents,
stats=stats,
summary=summary,
records=records,
max_output_tokens_per_file=args.max_output_tokens_per_file,
)
part_count = max(len(written_files) - 1, 0)
print()
print(
"Wrote JSON output: "
f"detected {len(records)} records for filter '{args.records_filter}' "
f"and emitted {part_count} part file(s) plus a manifest."
)
return 0
if __name__ == "__main__":
sys.exit(main())