Add plannotator extension v0.19.10

2026-05-07 11:38:14 +10:00
parent e914bc59c9
commit f37e4565ff
91 changed files with 35103 additions and 0 deletions
--- a/extensions/plannotator/skills/plannotator-annotate/SKILL.md
+++ b/extensions/plannotator/skills/plannotator-annotate/SKILL.md
@@ -0,0 +1,23 @@
+---
+name: plannotator-annotate
+description: Open Plannotator's annotation UI for a markdown file, converted HTML file, URL, or folder and then respond to the returned annotations.
+---
+
+# Plannotator Annotate
+
+Use this skill when the user wants to annotate a document in Plannotator instead of reviewing it inline in chat.
+
+Run:
+
+```bash
+plannotator annotate <path-or-url>
+```
+
+Behavior:
+
+1. Launch the command with Bash.
+2. Wait for the browser review to finish.
+3. If annotations are returned, address them directly.
+4. If the session closes without feedback, say so briefly and continue.
+
+Do not ask the user to paste a shell command into the chat. Run the command yourself.
--- a/extensions/plannotator/skills/plannotator-annotate/agents/openai.yaml
+++ b/extensions/plannotator/skills/plannotator-annotate/agents/openai.yaml
@@ -0,0 +1,5 @@
+interface:
+  display_name: "Plannotator Annotate"
+  short_description: "Annotate a markdown file, URL, or folder in Plannotator."
+policy:
+  allow_implicit_invocation: false
--- a/extensions/plannotator/skills/plannotator-compound/SKILL.md
+++ b/extensions/plannotator/skills/plannotator-compound/SKILL.md
@@ -0,0 +1,574 @@
+---
+name: plannotator-compound
+disable-model-invocation: true
+description: >
+  Analyze a user's Plannotator plan archive to extract denial patterns, feedback
+  taxonomy, evolution over time, and actionable prompt improvements — then produce
+  a polished HTML dashboard report. Falls back to Claude Code ExitPlanMode denial
+  reasons when Plannotator data is unavailable.
+---
+
+# Compound Planning Analysis
+
+You are conducting a comprehensive research analysis of a user's Plannotator plan
+archive. The goal: extract patterns from their denied plans, reduce
+them into actionable insights, and produce an elegant HTML dashboard report.
+
+This is a multi-phase process. Each phase must complete fully before the next begins.
+Research integrity is paramount — every file must be read, no skipping.
+
+## Source Selection
+
+Before starting the analysis, determine which data source is available.
+
+1. **Plannotator mode (first-class)** — Check `~/.plannotator/plans/`. If it
+   exists and contains `*-denied.md` files, use this mode. The entire workflow
+   below is written for Plannotator data.
+
+2. **Claude Code fallback mode** — If the Plannotator archive is absent or
+   contains no denied plans, check `~/.claude/projects/`. If present, read
+   [references/claude-code-fallback.md](references/claude-code-fallback.md)
+   before continuing. That reference explains how to use the bundled parser at
+   [scripts/extract_exit_plan_mode_outcomes.py](scripts/extract_exit_plan_mode_outcomes.py)
+   to extract denial reasons from Claude Code JSONL transcripts. Every phase
+   below has a short note explaining what changes in fallback mode — the
+   reference file has the details.
+
+3. **Neither available** — Ask the user for their Plannotator plans directory or
+   Claude Code projects directory. Do not guess.
+
+## Phase 0: Locate Plans & Check for Previous Reports
+
+Use the mode chosen in Source Selection above.
+
+**Plannotator mode:** Verify the plans directory contains `*-denied.md` files. If
+none exist, fall back to Claude Code mode before stopping.
+
+**Claude Code fallback mode:** Run the bundled parser per the fallback reference to
+build the denial-reason dataset. Create `/tmp/compound-planning/` if needed.
+
+In either mode, proceed to Previous Report Detection below.
+
+### Previous Report Detection
+
+After locating the plans directory, check for existing reports:
+
+```
+ls ~/.plannotator/plans/compound-planning-report*.html
+```
+
+Reports follow a versioned naming scheme:
+- First report: `compound-planning-report.html`
+- Subsequent reports: `compound-planning-report-v2.html`, `compound-planning-report-v3.html`, etc.
+
+If one or more reports exist, determine the **latest** one (highest version number).
+Get its filesystem modification date using `stat` (macOS: `stat -f %Sm -t %Y-%m-%d`,
+Linux: `stat -c %y | cut -d' ' -f1`). This is the **cutoff date**.
+
+Present the user with a choice:
+
+> "I found a previous report (`compound-planning-report-v{N}.html`) last updated
+> on {CUTOFF_DATE}. I can either:
+>
+> 1. **Incremental** — Only analyze files dated after {CUTOFF_DATE}, saving tokens
+>    and building on previous findings
+> 2. **Full** — Re-analyze the entire archive from scratch
+>
+> Which would you prefer?"
+
+Wait for the user's response before proceeding.
+
+**If incremental:** Filter all subsequent phases to only process files with dates
+after the cutoff date. The new report version will note in its header narrative that
+it covers the period from {CUTOFF_DATE} to present, and reference the previous
+report for earlier findings. The inventory (Phase 1) should still count ALL files
+for overall stats, but clearly separate "new since last report" counts.
+
+**If full:** Proceed normally with all files, but still use the next version number
+for the output filename.
+
+**If no previous report exists:** Proceed normally. The output filename will be
+`compound-planning-report.html` (no version suffix for the first report).
+
+## Phase 1: Inventory
+
+Count and report the dataset. **Always count ALL files** for overall stats,
+regardless of whether this is an incremental or full run:
+
+```
+- *-approved.md files (count)
+- *-denied.md files (count)
+- Date range (earliest to latest date found in filenames)
+- Total days spanned
+- Revision rate: denied / (approved + denied) — this is the "X% of plans
+  revised before coding" stat used in dashboard section 1
+```
+
+**Note:** Ignore `*.annotations.md` files entirely. Denied files already contain
+the full plan text plus all reviewer feedback appended after a `---` separator.
+Annotation files are redundant subsets of this content — reading both would
+double-count feedback.
+
+**If incremental mode:** After the total counts, separately report the counts for
+files dated after the cutoff date only:
+
+```
+New since {CUTOFF_DATE}:
+- *-denied.md files: X (of Y total)
+- New date range: {CUTOFF_DATE} to {LATEST_DATE}
+- New days spanned: N
+```
+
+If fewer than 3 new denied files exist since the cutoff, warn the user:
+> "Only {N} new denied plans since the last report. The incremental analysis may
+> be thin. Would you like to proceed or switch to a full analysis?"
+
+Also run `wc -l` across all `*-approved.md` files to get average lines per
+approved plan. This tells the user whether their plans are staying lightweight
+or bloating over time. You do not need to read approved plan contents — just
+their line counts. If possible, break this down by time period (e.g., monthly)
+to show whether plan size changed.
+
+Dates appear in filenames in YYYY-MM-DD format, sometimes as a prefix
+(2026-01-07-name-approved.md) and sometimes embedded (name-2026-03-15-approved.md).
+Extract dates from all filenames.
+
+Tell the user what you found and that you're beginning the extraction.
+
+**Claude Code fallback mode:** The Plannotator inventory fields above do not apply.
+Follow the inventory instructions in
+[references/claude-code-fallback.md](references/claude-code-fallback.md) instead —
+report the denial-reason dataset assembled by the parser.
+
+## Phase 2: Map — Parallel Extraction
+
+This is the most time-intensive phase. You must read EVERY `*-denied.md` file
+**in scope**. Do not skip files. Do not summarize early.
+
+**In scope** means: all denied files if running a full analysis, or only denied
+files dated after the cutoff date if running incrementally. In incremental mode,
+only process files whose embedded YYYY-MM-DD date is strictly after the cutoff.
+
+**Claude Code fallback mode:** The parser output is the clean source dataset. Read
+the fallback reference for the extraction prompt and batching strategy specific to
+JSON part files. Do not go back to raw `.jsonl` logs unless the parser fails or the
+user asks for audit-level verification.
+
+**Important:** Only read `*-denied.md` files. Do NOT read approved plans,
+annotation files, or diff files. Each denied file contains the full plan text
+followed by a `---` separator and the reviewer's feedback — everything needed
+for analysis is in one file.
+
+### Batching Strategy
+
+All extraction agents should use `model: "haiku"` — they're doing straightforward
+file reading and structured extraction, not reasoning. Haiku is faster and cheaper
+for this work.
+
+The approach depends on dataset size:
+
+**Tiny datasets (≤ 10 total files):** Read all files directly in the main agent —
+no need for sub-agents. Just read them sequentially and proceed to Phase 3.
+
+**Small datasets (11-30 files):** Launch 2-3 parallel Haiku agents, splitting
+files roughly evenly.
+
+**Medium datasets (31-80 files):** Launch 4-6 parallel Haiku agents (~10-15 files
+each). Split by file type and/or time period.
+
+**Large datasets (80+ files):** Launch as many parallel Haiku agents as needed to
+keep each batch around 10-15 files. Split by the natural time boundaries in the
+data (months, quarters, or whatever groupings produce balanced batches). If one
+time period dominates (e.g., the most recent month has 3x the files), split that
+period into multiple batches.
+
+Launch all extraction agents in parallel using the Agent tool with
+`run_in_background: true` and `model: "haiku"`.
+
+### Output Files
+
+Each extraction agent must write its results to a clean output file rather than
+relying on the agent task output (which contains interleaved JSONL framework
+logs that are difficult to parse). Instruct each agent to write to:
+
+```
+/tmp/compound-planning/extraction-{batch-name}.md
+```
+
+Create the `/tmp/compound-planning/` directory before launching agents. The
+reduce agent in Phase 3 will read these clean files directly.
+
+### Extraction Prompt
+
+Each agent receives this instruction (adapt the time period, file list, and
+output path):
+
+```
+You are extracting structured data from denied plan files for a pattern analysis.
+
+Directory: [PLANS DIRECTORY]
+Files to read: [LIST OF SPECIFIC *-denied.md FILES]
+Output: Write your complete results to [OUTPUT FILE PATH]
+
+Each denied file contains two parts separated by a --- line:
+1. The plan text (above the ---)
+2. The reviewer's feedback and annotations (below the ---)
+
+Read EVERY file in your list. For EACH file, extract:
+- The plan name/topic (from the plan text above the ---)
+- The denial reason or feedback given (from below the --- — capture the actual
+  words used)
+- What was specifically asked to change
+- The type of feedback (let the content determine the category — don't force-fit
+  into predefined types. Common types include things like: scope concerns,
+  approach disagreements, missing information, process requirements, quality
+  concerns, UX/design issues, naming disputes, clarification requests,
+  testing/procedural denials — but the user's actual patterns may differ)
+- Any specific phrases or recurring language from the reviewer
+- Individual annotations if present (numbered feedback items with quoted text
+  and reviewer comments)
+- The date (extracted from the filename)
+
+Do NOT skip any files. One entry per file.
+
+Format each entry as:
+**[filename]**
+- Date: ...
+- Topic: ...
+- Denial reason: ...
+- Feedback type: ...
+- Specific asks: ...
+- Notable phrases: ...
+- Annotations: [count, with brief summary of each]
+---
+
+After processing all files, write the complete results to [OUTPUT FILE PATH].
+State the total file count at the end of the file.
+```
+
+### While Agents Run
+
+Track completion. As each agent finishes, note the count of files it processed.
+Verify the total matches the inventory from Phase 1. If any agent's count is
+short, flag it and consider re-launching for the missing files.
+
+If an agent times out (possible with large batches — a batch of 128 files can
+take 8+ minutes), re-launch it for just the unprocessed files. Check the output
+file to see how far it got before timing out.
+
+## Phase 3: Reduce — Pattern Analysis
+
+Once ALL extraction agents have completed (or all files have been read for tiny
+datasets), proceed with the reduction. Reduction agents should use `model: "sonnet"`
+— this phase requires real analytical reasoning, not just file reading.
+
+### Reduction Strategy
+
+The approach depends on how many extraction files were produced:
+
+**Standard (≤ 20 extraction files):** Launch a single Sonnet agent to read all
+extraction files and produce the full analysis. This covers most datasets.
+
+**Large (21+ extraction files):** Use a two-stage reduce:
+
+1. **Stage 1 — Partial reduces:** Split the extraction files into groups of 4-6.
+   Launch parallel Sonnet agents, each reading one group and producing a partial
+   analysis with the same sections listed below. Each writes to
+   `/tmp/compound-planning/partial-reduce-{N}.md`.
+
+2. **Stage 2 — Final reduce:** A single Sonnet agent reads all partial reduce
+   files and synthesizes them into the final comprehensive analysis. This agent
+   merges taxonomies, combines counts, deduplicates patterns, and reconciles any
+   conflicting categorizations across partials.
+
+**Claude Code fallback mode:** The reduction phase is the same. The only upstream
+difference is that extraction files were derived from normalized denial-reason JSON
+instead of Plannotator markdown files.
+
+### Reduction Prompt
+
+Give each reduction agent this prompt (adapt file paths for single vs multi-stage):
+
+```
+You are a data scientist conducting the reduction phase of a map-reduce analysis
+across a user's denied plan archive.
+
+Read ALL extraction files at [FILE PATHS]
+
+These files contain structured extractions from every denied plan file. Each
+extraction includes the plan topic, denial feedback, annotations, and reviewer
+language. Your job: aggregate everything, find patterns, cluster into a taxonomy,
+and produce a comprehensive analysis.
+
+Be exhaustive. Use real counts. Quote real phrases from the data. This is
+research — no hand-waving, no fabrication.
+
+Write your complete results to [OUTPUT FILE PATH].
+
+Produce the following sections:
+[... sections listed below ...]
+```
+
+The reduction agent's job is to let the data speak. Do not impose a predetermined
+framework — discover what's actually there. The analysis must produce:
+
+### 1. Denial Reason Taxonomy
+Categorize every denial into a finite set of types that emerge from the data. Count
+occurrences. Show percentages. Include real example quotes for each type. Aim for
+8-15 categories — enough to be specific, few enough to be scannable. Let the user's
+actual feedback determine what the categories are.
+
+### 2. Top Feedback Patterns (ranked by frequency)
+The 5-10 most recurring patterns. For each: what the reviewer consistently asks for,
+3+ example quotes from different files, and whether the pattern changed over time.
+
+### 3. Recurring Phrases
+Exact phrases the reviewer uses repeatedly, with counts and what they signal. These
+are the reviewer's vocabulary — their shorthand for what they care about.
+
+### 4. What the Reviewer Values (implicit preferences)
+Derived from patterns — what does this specific person care about most? Quality?
+Speed? Narrative? Architecture? Process? Simplicity? Rank by evidence strength.
+This section should feel like a personality profile of the reviewer's standards.
+
+### 5. What Agents Consistently Get Wrong
+The flip side — what recurring mistakes trigger denials? What should agents stop
+doing for this reviewer?
+
+### 6. Structural Requests
+What plan structure does the reviewer consistently demand? Required sections,
+ordering, format preferences, level of detail expected.
+
+### 7. Evolution Over Time
+How feedback patterns changed across the time span. Group by whatever natural time
+boundaries exist in the data (weeks for short spans, months for longer ones). Did
+expectations mature? Did new patterns emerge? What shifted? If the dataset spans
+less than a month, note that evolution analysis is limited but still look for any
+progression from early to late files.
+
+### 8. Actionable Prompt Instructions
+The most important output. Based on all patterns: specific numbered instructions
+that could be embedded in a planning prompt to prevent the most common denial
+reasons. Write these as actual directives an agent could follow. Be specific to
+this user's patterns — generic advice like "write good plans" is worthless. Each
+instruction should trace back to a real, frequent denial pattern.
+
+After writing the instructions, calculate what percentage of denials they would
+address (count how many denials fall into categories covered by the instructions
+vs total denials). Report this percentage — it will be different for every user.
+
+## Phase 4: Generate the HTML Dashboard
+
+Build a single, self-contained HTML file as the final deliverable. Save it to
+the user's plans directory with a versioned filename:
+
+- First ever report: `compound-planning-report.html`
+- Second report: `compound-planning-report-v2.html`
+- Third report: `compound-planning-report-v3.html`
+- And so on.
+
+The version number was determined in Phase 0 based on existing reports found.
+
+**If this is an incremental report**, the header should indicate the analysis
+period (e.g., "March 15 – March 31, 2026") and include a subtitle noting
+"Incremental analysis — see v{N-1} for earlier findings." The narrative in
+section 1 should frame findings as what's new or changed since the last report,
+not as a complete picture. Overall stats in the header (file counts, revision
+rate) should still reflect the full archive for context.
+
+Read the template at `assets/report-template.html` for the **design language
+only**. The template contains example data from a previous analysis — ignore all
+data values, quotes, and percentages in the template. Use only its visual design:
+colors, typography, spacing, component styles, and layout patterns.
+
+### Design Language (from template)
+
+- **Palette:** Light mode, warm off-white (#FDFCFB), text in slate scale, amber
+  for highlights/accents, emerald for positive, rose for negative, indigo for
+  action elements
+- **Typography:** Playfair Display (serif, for narrative headings), Inter (sans,
+  for body/data), JetBrains Mono (mono, for code/phrases) — Google Fonts CDN
+- **Layout:** Single-column, max-width 1024px, generous vertical whitespace (128px
+  between major sections), editorial/narrative-first aesthetic
+- **Tone:** Calm, reflective, authoritative. Like a personal retrospective journal,
+  not a monitoring dashboard.
+
+### Page Frame (header + footer)
+
+Before the 7 sections, the page has:
+
+- **Header:** Report title on the left (Playfair Display, ~36px), project name +
+  date range below it in light meta text. On the right: file counts in mono
+  (e.g., "223 denials · 71 days"). Separated from content by
+  a bottom border. Generous bottom padding before section 1.
+
+- **Footer:** After section 7. Top border, centered italic Playfair Display tagline
+  summarizing the corpus (e.g., "Analysis of X denied plans from the Plannotator
+  archive.").
+
+### Dashboard Section Order (7 sections)
+
+The report follows this exact section order. Each section builds on the previous
+one — the flow moves from "what happened" through "why" to "what to do about it":
+
+1. **The story in the data** — An editorial narrative paragraph (Playfair Display
+   serif, ~26px) that tells the headline finding in prose. Not bullet points — a
+   real paragraph that reads like the opening of an article. Alongside it, a KPI
+   sidebar with 3 key metrics (the top denial percentage, the overall revision
+   rate, and the number of distinct denial categories found). Use an amber inline
+   highlight on the most striking number in the narrative.
+
+2. **Why plans get denied** — The taxonomy as a ranked list. Each row: rank number
+   (mono), category label, a thin 4px progress bar (top item in amber-500, rest
+   in slate-300), percentage (mono), and for the top entries, a real italic quote
+   from the data below the label. Show the top 10 categories or however many the
+   data supports (minimum 5).
+
+3. **How expectations evolved** — One card per natural time period. Each card has:
+   the period name in serif, a theme phrase in colored uppercase (different color
+   per period to show progression), a description paragraph, and a stat line at
+   the bottom (e.g., "X denials · Y narrative requests"). If the data spans less
+   than 3 distinct periods, use 2 cards or even a single card with internal
+   progression noted.
+
+4. **What works vs what doesn't** — Two side-by-side cards. Left: green-tinted
+   (emerald-50/50 bg, emerald-100 border) with traits of plans that succeed for
+   this reviewer. Right: red-tinted (rose-50/50 bg, rose-100 border) with what
+   agents keep getting wrong. Both derived from the reduction analysis. Bulleted
+   with small colored dots. 5-8 items per card.
+
+5. **The actionable output** — The diagnostic payoff. Opens with a Playfair
+   Display narrative sentence stating how many prompt instructions were derived
+   and what estimated percentage of denials they address (use the real calculated
+   percentage from Phase 3, not a generic number). Then the top 3 most impactful
+   improvements as numbered items, each with an amber number, bold title, and
+   one-line description. This section bridges the analysis and the full prompt
+   that follows.
+
+6. **Your most-used phrases** — Grid of chips (2-col mobile, 3-col desktop). Each
+   chip: monospace quoted phrase on the left, frequency count on the right. White
+   bg, slate-200 border, rounded-12px. Show 9-12 of the most recurring phrases
+   found. These should be the reviewer's actual words — their verbal fingerprint.
+
+7. **The corrective prompt** — Dark panel (slate-900 bg, white text, rounded-3xl,
+   shadow-xl). Opens with a Playfair intro sentence about the instructions. Then
+   a dark code block (slate-800/80 bg, amber-200 monospace text) containing the
+   full numbered prompt instructions from Phase 3. Include a copy-to-clipboard
+   button that works (JS included). Below the code block: a gradient glow card
+   (indigo-to-purple blurred halo behind a white card) with a closing message
+   that these instructions are personal — derived from the user's own feedback,
+   their own language, their own standards.
+
+### Adaptation Rules
+
+- If the user has < 3 months of data, reduce the evolution section to fewer cards
+- If most denied files lack feedback below the `---` (bare denials with no
+  annotations), note this in the narrative — the analysis will be thinner
+- **Claude Code fallback mode:** Explicitly label the report source as Claude Code
+  `ExitPlanMode` denial reasons. Do not fabricate Plannotator-only fields such as
+  annotation counts or approved-plan line counts. See the fallback reference for
+  KPI substitutes and footer/provenance guidance.
+- If fewer than 5 denial categories emerge, combine the taxonomy and patterns
+  sections into one
+- If the dataset is very small (< 20 files), the narrative should acknowledge the
+  limited sample size and frame findings as preliminary
+- The number of prompt instructions will vary per user — could be 8 or 20. Don't
+  force exactly 17. Let the data determine the count.
+- The top 3 actionable items in section 5 must be the 3 that cover the largest
+  share of denials, not the 3 that sound most impressive
+
+### Key Rules
+
+1. Every number must come from the real analysis — no fabricated data
+2. Every quote must be a real quote from a real file
+3. The taxonomy percentages must be calculated from real counts
+4. The prompt instructions must trace back to actual denial patterns
+5. The copy button on the prompt block must work (include the JS)
+
+After generating, open the file in the user's browser.
+
+## Phase 5: Summary
+
+Tell the user:
+- How many denied files were analyzed
+- If incremental: how many were new since the last report
+- The top 3 denial patterns found
+- The estimated percentage of denials the prompt instructions would address
+- The single most impactful prompt improvement
+- Where the report was saved (including version number)
+- If incremental: remind the user that earlier findings are in the previous report
+
+**Claude Code fallback mode:** Adapt the summary per the fallback reference —
+report human denial reasons analyzed and total `ExitPlanMode` attempts scanned
+instead of Plannotator file counts.
+
+## Phase 6: Improvement Hook
+
+After presenting the summary, ask the user if they want to enable an **improvement
+hook** — this takes the corrective prompt instructions from section 7 of the report
+and writes them to a file that Plannotator's `EnterPlanMode` hook can inject into
+every future planning session automatically.
+
+> "Would you like to enable the improvement hook? This will save the corrective
+> prompt instructions to a file that gets automatically injected into all future
+> planning sessions — so Claude sees your feedback patterns before writing any plan."
+
+**If yes:**
+
+The hook file lives at:
+
+```
+~/.plannotator/hooks/compound/enterplanmode-improve-hook.txt
+```
+
+Create the `~/.plannotator/hooks/compound/` directory if it doesn't exist.
+
+The file contents should be the corrective prompt instructions from Phase 3 —
+the same numbered list that appears in section 7 of the HTML report. Write them
+as plain text, one instruction per line, prefixed with their number. No HTML, no
+markdown fences, no preamble — just the instructions themselves. The hook system
+will inject this file's contents as-is into the planning context.
+
+**If the file already exists:**
+
+Read the existing file and present the user with a choice:
+
+> "An improvement hook already exists from a previous analysis. I can:
+>
+> 1. **Replace** — Overwrite with the new instructions (the old ones are gone)
+> 2. **Merge** — Combine both, deduplicating overlapping instructions and
+>    keeping the best version of each
+> 3. **Keep existing** — Leave the current hook as-is, skip this step
+>
+> Which would you prefer?"
+
+- **Replace:** Overwrite the file with the new instructions.
+- **Merge:** Read the existing instructions, compare with the new ones, and
+  produce a merged set. Remove duplicates (same intent even if worded differently).
+  When two instructions cover the same pattern, keep the more specific or
+  actionable version. Re-number the final list sequentially. Write the merged
+  result to the file. Show the user what changed (added N new, removed N
+  redundant, kept N existing).
+- **Keep existing:** Do nothing, move on.
+
+**If no:** Skip this phase entirely.
+
+## Important Notes
+
+- **Data source priority:** Plannotator is the first-class path. Claude Code log
+  analysis is the secondary path for users without Plannotator archives.
+- **Research integrity:** Every file must be read. The value of this analysis comes
+  from completeness. Sampling or skipping undermines the findings.
+- **Real data only:** Never fabricate quotes, percentages, or patterns. If the data
+  doesn't show a clear pattern, say so honestly rather than inventing one.
+- **Let the data lead:** The taxonomy, patterns, and instructions should emerge from
+  what's actually in the files. Different users will have completely different
+  denial patterns. A user building mobile apps will have different feedback than
+  one building APIs. Don't assume what the patterns will be.
+- **Agent parallelization:** For large datasets, maximize parallel agents to reduce
+  wall-clock time. The bottleneck is the largest batch — split it.
+- **Structured extraction format:** Ask extraction agents to return structured text
+  with consistent delimiters so the reduce agent can parse reliably.
+- **The report is the artifact:** The HTML dashboard is what the user keeps. It
+  should be beautiful, honest, and useful. Every section should feel like it was
+  written about them specifically, because it was.
--- a/extensions/plannotator/skills/plannotator-compound/assets/report-template.html
+++ b/extensions/plannotator/skills/plannotator-compound/assets/report-template.html
@@ -0,0 +1,795 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Compound Planning — What 370 Files Reveal</title>
+<link rel="preconnect" href="https://fonts.googleapis.com">
+<link href="https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@0,400;0,500;1,400&family=Inter:wght@300;400;500;600&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
+<style>
+  *, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
+
+  :root {
+    --bg: #FDFCFB;
+    --slate-900: #0f172a;
+    --slate-800: #1e293b;
+    --slate-700: #334155;
+    --slate-600: #475569;
+    --slate-500: #64748b;
+    --slate-400: #94a3b8;
+    --slate-300: #cbd5e1;
+    --slate-200: #e2e8f0;
+    --slate-100: #f1f5f9;
+    --slate-50: #f8fafc;
+    --amber-500: #f59e0b;
+    --amber-600: #d97706;
+    --amber-700: #b45309;
+    --amber-50: #fffbeb;
+    --emerald-500: #10b981;
+    --emerald-600: #059669;
+    --emerald-400: #34d399;
+    --emerald-900: #064e3b;
+    --emerald-800: #065f46;
+    --emerald-100: #d1fae5;
+    --emerald-50: #ecfdf5;
+    --rose-500: #f43f5e;
+    --rose-600: #e11d48;
+    --rose-400: #fb7185;
+    --rose-900: #881337;
+    --rose-800: #9f1239;
+    --rose-100: #ffe4e6;
+    --rose-50: #fff1f2;
+    --indigo-500: #6366f1;
+    --indigo-600: #4f46e5;
+    --purple-600: #9333ea;
+  }
+
+  body {
+    font-family: 'Inter', ui-sans-serif, system-ui, sans-serif;
+    background: var(--bg);
+    color: var(--slate-800);
+    -webkit-font-smoothing: antialiased;
+  }
+
+  .container {
+    max-width: 1024px;
+    margin: 0 auto;
+    padding: 48px 24px 64px;
+  }
+  @media (min-width: 768px) { .container { padding: 96px 24px 80px; } }
+
+  /* Typography */
+  .font-serif { font-family: 'Playfair Display', ui-serif, Georgia, serif; }
+  .font-mono { font-family: 'JetBrains Mono', ui-monospace, monospace; }
+
+  /* Header */
+  header {
+    border-bottom: 1px solid var(--slate-200);
+    padding-bottom: 40px;
+    margin-bottom: 96px;
+    display: flex;
+    justify-content: space-between;
+    align-items: flex-end;
+    flex-wrap: wrap;
+    gap: 16px;
+  }
+  header h1 {
+    font-family: 'Playfair Display', serif;
+    font-size: 36px;
+    font-weight: 400;
+    color: var(--slate-900);
+    line-height: 1.2;
+  }
+  header .meta {
+    font-size: 15px;
+    font-weight: 300;
+    color: var(--slate-500);
+    letter-spacing: 0.04em;
+  }
+
+  /* Sections */
+  .section { margin-bottom: 128px; }
+  .section-label {
+    font-size: 12px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 0.2em;
+    color: var(--slate-400);
+    margin-bottom: 24px;
+  }
+
+  /* Narrative + KPIs */
+  .summary {
+    display: grid;
+    grid-template-columns: 1fr;
+    gap: 48px;
+    align-items: start;
+  }
+  @media (min-width: 768px) {
+    .summary { grid-template-columns: 1fr 240px; }
+  }
+  .narrative {
+    font-family: 'Playfair Display', serif;
+    font-size: 26px;
+    line-height: 1.45;
+    color: var(--slate-900);
+  }
+  .narrative .highlight {
+    background: var(--amber-50);
+    color: var(--amber-700);
+    padding: 1px 6px;
+    border-radius: 3px;
+  }
+  .kpi-stack {
+    display: flex;
+    flex-direction: column;
+    gap: 32px;
+  }
+  @media (min-width: 768px) {
+    .kpi-stack { border-left: 1px solid var(--slate-200); padding-left: 32px; }
+  }
+  .kpi-item .kpi-value {
+    font-size: 36px;
+    font-weight: 300;
+    color: var(--slate-900);
+    letter-spacing: -0.02em;
+  }
+  .kpi-item .kpi-label {
+    font-size: 10px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 0.15em;
+    color: var(--slate-500);
+    margin-top: 2px;
+  }
+
+  /* Taxonomy bars */
+  .taxonomy-list { display: flex; flex-direction: column; gap: 20px; }
+  .tax-row { display: grid; grid-template-columns: 24px 1fr 52px; gap: 12px; align-items: center; }
+  .tax-rank {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12px;
+    color: var(--slate-400);
+    text-align: right;
+  }
+  .tax-body { display: flex; flex-direction: column; gap: 6px; }
+  .tax-label { font-size: 14px; font-weight: 500; color: var(--slate-800); }
+  .tax-bar-track { height: 4px; background: var(--slate-100); border-radius: 100px; overflow: hidden; }
+  .tax-bar-fill { height: 100%; border-radius: 100px; transition: width 0.6s ease; }
+  .tax-bar-fill.top { background: var(--amber-500); }
+  .tax-bar-fill.rest { background: var(--slate-300); }
+  .tax-pct {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12px;
+    color: var(--slate-500);
+    text-align: right;
+  }
+  .tax-quote {
+    font-size: 12px;
+    font-style: italic;
+    color: var(--slate-500);
+    margin-top: 2px;
+  }
+
+  /* Evolution timeline */
+  .evolution-grid {
+    display: grid;
+    grid-template-columns: 1fr;
+    gap: 24px;
+  }
+  @media (min-width: 768px) { .evolution-grid { grid-template-columns: repeat(3, 1fr); } }
+  .evo-card {
+    background: white;
+    border: 1px solid var(--slate-200);
+    border-radius: 16px;
+    padding: 28px;
+  }
+  .evo-card .evo-month {
+    font-family: 'Playfair Display', serif;
+    font-size: 20px;
+    color: var(--slate-900);
+    margin-bottom: 4px;
+  }
+  .evo-card .evo-theme {
+    font-size: 12px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 0.12em;
+    margin-bottom: 16px;
+  }
+  .evo-card .evo-desc {
+    font-size: 14px;
+    color: var(--slate-600);
+    line-height: 1.6;
+  }
+  .evo-card .evo-stat {
+    margin-top: 16px;
+    padding-top: 16px;
+    border-top: 1px solid var(--slate-100);
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12px;
+    color: var(--slate-500);
+  }
+  .evo-jan .evo-theme { color: var(--slate-600); }
+  .evo-feb .evo-theme { color: var(--amber-600); }
+  .evo-mar .evo-theme { color: var(--indigo-600); }
+
+  /* Quality comparison */
+  .quality-grid {
+    display: grid;
+    grid-template-columns: 1fr;
+    gap: 24px;
+  }
+  @media (min-width: 768px) { .quality-grid { grid-template-columns: 1fr 1fr; } }
+  .q-card {
+    border-radius: 24px;
+    padding: 36px;
+  }
+  .q-card.good {
+    background: color-mix(in srgb, var(--emerald-50) 50%, transparent);
+    border: 1px solid var(--emerald-100);
+  }
+  .q-card.bad {
+    background: color-mix(in srgb, var(--rose-50) 50%, transparent);
+    border: 1px solid var(--rose-100);
+  }
+  .q-card .q-icon { font-size: 20px; margin-bottom: 12px; }
+  .q-card .q-title {
+    font-family: 'Playfair Display', serif;
+    font-size: 22px;
+    margin-bottom: 20px;
+  }
+  .q-card.good .q-title { color: var(--emerald-900); }
+  .q-card.bad .q-title { color: var(--rose-900); }
+  .q-list { list-style: none; display: flex; flex-direction: column; gap: 14px; }
+  .q-list li {
+    display: flex;
+    align-items: flex-start;
+    gap: 10px;
+    font-size: 14px;
+    line-height: 1.6;
+  }
+  .q-card.good .q-list li { color: color-mix(in srgb, var(--emerald-800) 90%, transparent); }
+  .q-card.bad .q-list li { color: color-mix(in srgb, var(--rose-800) 90%, transparent); }
+  .q-dot {
+    width: 6px;
+    height: 6px;
+    border-radius: 50%;
+    flex-shrink: 0;
+    margin-top: 7px;
+  }
+  .q-card.good .q-dot { background: var(--emerald-400); }
+  .q-card.bad .q-dot { background: var(--rose-400); }
+
+  /* Phrases */
+  .phrases-grid {
+    display: grid;
+    grid-template-columns: repeat(2, 1fr);
+    gap: 12px;
+  }
+  @media (min-width: 768px) { .phrases-grid { grid-template-columns: repeat(3, 1fr); } }
+  .phrase-chip {
+    background: white;
+    border: 1px solid var(--slate-200);
+    border-radius: 12px;
+    padding: 14px 16px;
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    gap: 8px;
+  }
+  .phrase-text {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12px;
+    color: var(--slate-700);
+    white-space: nowrap;
+    overflow: hidden;
+    text-overflow: ellipsis;
+  }
+  .phrase-count {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 11px;
+    color: var(--slate-400);
+    flex-shrink: 0;
+  }
+
+  /* Dark action panel */
+  .action-panel {
+    background: var(--slate-900);
+    color: white;
+    border-radius: 24px;
+    padding: 40px;
+    box-shadow: 0 20px 25px -5px rgb(0 0 0 / 0.1), 0 8px 10px -6px rgb(0 0 0 / 0.1);
+  }
+  @media (min-width: 768px) { .action-panel { padding: 56px; } }
+  .action-panel .section-label { color: var(--slate-500); }
+  .action-panel .ap-intro {
+    font-family: 'Playfair Display', serif;
+    font-size: 22px;
+    color: white;
+    line-height: 1.4;
+    margin-bottom: 32px;
+    max-width: 640px;
+  }
+  .prompt-block {
+    background: color-mix(in srgb, var(--slate-800) 80%, transparent);
+    border: 1px solid color-mix(in srgb, var(--slate-700) 50%, transparent);
+    border-radius: 16px;
+    overflow: hidden;
+  }
+  .prompt-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    padding: 12px 20px;
+    border-bottom: 1px solid color-mix(in srgb, var(--slate-700) 30%, transparent);
+  }
+  .prompt-header-label {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12px;
+    color: var(--slate-400);
+    display: flex;
+    align-items: center;
+    gap: 8px;
+  }
+  .prompt-header-label svg { width: 14px; height: 14px; }
+  .copy-btn {
+    background: none;
+    border: none;
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12px;
+    color: var(--slate-400);
+    cursor: pointer;
+    display: flex;
+    align-items: center;
+    gap: 6px;
+    transition: color 0.2s;
+  }
+  .copy-btn:hover { color: white; }
+  .copy-btn.copied { color: var(--emerald-400); }
+  .prompt-body {
+    padding: 20px;
+    max-height: 480px;
+    overflow-y: auto;
+  }
+  .prompt-body pre {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 13px;
+    line-height: 1.7;
+    color: color-mix(in srgb, var(--amber-200) 90%, transparent);
+    white-space: pre-wrap;
+    word-break: break-word;
+  }
+  .prompt-body pre .comment {
+    color: var(--slate-500);
+  }
+
+  /* Glow card */
+  .glow-wrap {
+    position: relative;
+    margin-top: 48px;
+  }
+  .glow-bg {
+    position: absolute;
+    inset: -2px;
+    background: linear-gradient(135deg, var(--indigo-500), var(--purple-600));
+    border-radius: 26px;
+    opacity: 0.15;
+    filter: blur(16px);
+    transition: opacity 0.5s;
+  }
+  .glow-wrap:hover .glow-bg { opacity: 0.25; }
+  .glow-card {
+    position: relative;
+    background: white;
+    border: 1px solid var(--slate-200);
+    border-radius: 24px;
+    padding: 32px 36px;
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    flex-wrap: wrap;
+    gap: 20px;
+  }
+  .glow-card .gc-text {
+    font-family: 'Playfair Display', serif;
+    font-size: 18px;
+    font-weight: 500;
+    color: var(--slate-900);
+    line-height: 1.5;
+    max-width: 640px;
+  }
+  .glow-card .gc-text em {
+    font-style: italic;
+    color: var(--indigo-600);
+  }
+
+  /* Footer */
+  footer {
+    border-top: 1px solid var(--slate-200);
+    padding-top: 48px;
+    margin-top: 0;
+    text-align: center;
+  }
+  footer p {
+    font-family: 'Playfair Display', serif;
+    font-style: italic;
+    font-size: 15px;
+    color: var(--slate-400);
+  }
+
+  /* Scrollbar in dark code block */
+  .prompt-body::-webkit-scrollbar { width: 6px; }
+  .prompt-body::-webkit-scrollbar-track { background: transparent; }
+  .prompt-body::-webkit-scrollbar-thumb { background: var(--slate-700); border-radius: 3px; }
+</style>
+</head>
+<body>
+<div class="container">
+
+  <header>
+    <div>
+      <h1>What 370 Files Reveal About<br>How You Plan</h1>
+      <div class="meta" style="margin-top: 8px;">backnotprop/plannotator &middot; Jan 7 &ndash; Mar 18, 2026</div>
+    </div>
+    <div class="meta" style="text-align: right;">
+      <span class="font-mono" style="font-size: 12px;">202 denials &middot; 168 annotations &middot; 71 days</span>
+    </div>
+  </header>
+
+  <!-- 1. Narrative + KPIs -->
+  <div class="section">
+    <div class="section-label">1. The story in the data</div>
+    <div class="summary">
+      <div class="narrative">
+        Across 71 days you denied or revised <span class="highlight">202 plans</span> before any code was written. The single most common reason&mdash;appearing in 1 out of 4 denials&mdash;was the same: the agent jumped to implementation without telling you <em>what</em> it was building, <em>why</em>, or <em>how</em>. Missing narrative. Missing context. Missing the story. Your expectations evolved from &ldquo;does it work?&rdquo; in January to &ldquo;tell me the story and be confident&rdquo; by March.
+      </div>
+      <div class="kpi-stack">
+        <div class="kpi-item">
+          <div class="kpi-value">25.7%</div>
+          <div class="kpi-label">Denials for missing narrative</div>
+        </div>
+        <div class="kpi-item">
+          <div class="kpi-value">50%</div>
+          <div class="kpi-label">Plans revised before coding</div>
+        </div>
+        <div class="kpi-item">
+          <div class="kpi-value">12</div>
+          <div class="kpi-label">Distinct denial categories</div>
+        </div>
+      </div>
+    </div>
+  </div>
+
+  <!-- 2. Denial Taxonomy -->
+  <div class="section">
+    <div class="section-label">2. Why plans get denied</div>
+    <div class="taxonomy-list">
+      <div class="tax-row">
+        <span class="tax-rank">1</span>
+        <div class="tax-body">
+          <span class="tax-label">Missing Narrative / Overview</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill top" style="width: 100%"></div></div>
+          <span class="tax-quote">"This plan is denied without narrative detail and rationales."</span>
+        </div>
+        <span class="tax-pct">25.7%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">2</span>
+        <div class="tax-body">
+          <span class="tax-label">Clarification Needed</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 65%"></div></div>
+          <span class="tax-quote">"What does this Mean???"</span>
+        </div>
+        <span class="tax-pct">16.8%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">3</span>
+        <div class="tax-body">
+          <span class="tax-label">Testing / Procedural</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 54%"></div></div>
+          <span class="tax-quote">"I'm denying so you can create a diff."</span>
+        </div>
+        <span class="tax-pct">13.9%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">4</span>
+        <div class="tax-body">
+          <span class="tax-label">Wrong Approach / Over-Engineered</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 37%"></div></div>
+          <span class="tax-quote">"Why are we doing difficult shit here? I want a hover experience."</span>
+        </div>
+        <span class="tax-pct">9.4%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">5</span>
+        <div class="tax-body">
+          <span class="tax-label">Process Requirement</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 31%"></div></div>
+          <span class="tax-quote">"Make sure you feature branch."</span>
+        </div>
+        <span class="tax-pct">7.9%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">6</span>
+        <div class="tax-body">
+          <span class="tax-label">Confidence / Risk Check</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 29%"></div></div>
+          <span class="tax-quote">"Take a step back, breathe, make sure we're not being irrational."</span>
+        </div>
+        <span class="tax-pct">7.4%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">7</span>
+        <div class="tax-body">
+          <span class="tax-label">Content Removal</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 27%"></div></div>
+          <span class="tax-quote">"I don't want this in the plan."</span>
+        </div>
+        <span class="tax-pct">6.9%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">8</span>
+        <div class="tax-body">
+          <span class="tax-label">Implementation Bug Found</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 23%"></div></div>
+        </div>
+        <span class="tax-pct">5.9%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">9</span>
+        <div class="tax-body">
+          <span class="tax-label">Design / UX Issue</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 21%"></div></div>
+        </div>
+        <span class="tax-pct">5.4%</span>
+      </div>
+      <div class="tax-row">
+        <span class="tax-rank">10</span>
+        <div class="tax-body">
+          <span class="tax-label">Naming / Terminology</span>
+          <div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 16%"></div></div>
+          <span class="tax-quote">"Why do you keep calling it Simplified????"</span>
+        </div>
+        <span class="tax-pct">4.0%</span>
+      </div>
+    </div>
+  </div>
+
+  <!-- 3. Evolution -->
+  <div class="section">
+    <div class="section-label">3. How your expectations evolved</div>
+    <div class="evolution-grid">
+      <div class="evo-card evo-jan">
+        <div class="evo-month">January</div>
+        <div class="evo-theme">"Does it work?"</div>
+        <div class="evo-desc">Bug-hunting phase. You were hands-on testing View Logs, iterating on session scoping heuristics. 60% of denials were implementation bugs and verification failures. No mention of &ldquo;narrative&rdquo; or &ldquo;overview&rdquo; yet.</div>
+        <div class="evo-stat">26 denials &middot; 0 narrative requests</div>
+      </div>
+      <div class="evo-card evo-feb">
+        <div class="evo-month">February</div>
+        <div class="evo-theme">"Follow the process"</div>
+        <div class="evo-desc">Process gates emerged: feature branches, Linear tickets, pull main. 40% of denials were procedural (diff testing). UX polish intensified. The first narrative demands appeared: &ldquo;I want a narrative under each section.&rdquo;</div>
+        <div class="evo-stat">48 denials &middot; 6 narrative requests</div>
+      </div>
+      <div class="evo-card evo-mar">
+        <div class="evo-month">March</div>
+        <div class="evo-theme">"Tell me the story"</div>
+        <div class="evo-desc">Narrative became the #1 gate. You created a &ldquo;Missing overview&rdquo; label and applied it systematically. Confidence checks became standard. You began telling agents to &ldquo;take a step back, breathe, and analyze.&rdquo;</div>
+        <div class="evo-stat">128 denials &middot; 25+ narrative requests</div>
+      </div>
+    </div>
+  </div>
+
+  <!-- 4. Quality comparison -->
+  <div class="section">
+    <div class="section-label">4. What works vs. what doesn't</div>
+    <div class="quality-grid">
+      <div class="q-card good">
+        <div class="q-icon">&#10003;</div>
+        <div class="q-title">What approved plans do</div>
+        <ul class="q-list">
+          <li><span class="q-dot"></span>Lead with a narrative overview: what exists, what changes, why</li>
+          <li><span class="q-dot"></span>State confidence and identify risks proactively</li>
+          <li><span class="q-dot"></span>Reference existing codebase patterns before proposing new code</li>
+          <li><span class="q-dot"></span>Use explicit, transparent naming (not euphemisms)</li>
+          <li><span class="q-dot"></span>Break large work into phases with evaluation gates</li>
+          <li><span class="q-dot"></span>Include example output for user-facing changes</li>
+          <li><span class="q-dot"></span>Specify feature branch and ticket creation steps</li>
+        </ul>
+      </div>
+      <div class="q-card bad">
+        <div class="q-icon">&#10007;</div>
+        <div class="q-title">What agents keep getting wrong</div>
+        <ul class="q-list">
+          <li><span class="q-dot"></span>Jump to implementation steps without narrative context</li>
+          <li><span class="q-dot"></span>Over-engineer: Shift+Click when hover works, MCP tool when a README suffices</li>
+          <li><span class="q-dot"></span>Introduce new code for things the codebase already solves</li>
+          <li><span class="q-dot"></span>Propose work on top of failing lint/type checks</li>
+          <li><span class="q-dot"></span>Use vague or euphemistic naming (&ldquo;Accept&rdquo; instead of &ldquo;Git Add&rdquo;)</li>
+          <li><span class="q-dot"></span>Wait to be asked for confidence instead of stating it</li>
+          <li><span class="q-dot"></span>Rush to modify instead of reporting what they see</li>
+        </ul>
+      </div>
+    </div>
+  </div>
+
+  <!-- 5. The actionable output -->
+  <div class="section">
+    <div class="section-label">5. The actionable output</div>
+    <div class="narrative" style="margin-bottom: 32px;">
+      The analysis produced <span class="highlight">17 specific prompt instructions</span> that, if embedded in a planning prompt, would address ~70% of all denial reasons. The biggest three:
+    </div>
+    <div style="display: flex; flex-direction: column; gap: 20px;">
+      <div style="display: flex; gap: 16px; align-items: flex-start;">
+        <span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">1</span>
+        <div>
+          <div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">Every plan MUST start with a Solution Overview</div>
+          <div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">What exists, what changes, why, how. This alone addresses 1 in 4 denials.</div>
+        </div>
+      </div>
+      <div style="display: flex; gap: 16px; align-items: flex-start;">
+        <span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">2</span>
+        <div>
+          <div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">End every plan with a Confidence Assessment</div>
+          <div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">Don&rsquo;t wait to be asked. State your confidence, identify risks, flag uncertainties.</div>
+        </div>
+      </div>
+      <div style="display: flex; gap: 16px; align-items: flex-start;">
+        <span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">3</span>
+        <div>
+          <div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">Search for existing patterns before proposing new code</div>
+          <div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">Explicitly state what you found in the codebase. Prefer reuse over new implementation.</div>
+        </div>
+      </div>
+    </div>
+  </div>
+
+  <!-- 6. Recurring phrases -->
+  <div class="section">
+    <div class="section-label">6. Your most-used phrases</div>
+    <div class="phrases-grid">
+      <div class="phrase-chip"><span class="phrase-text">"narrative"</span><span class="phrase-count">50+</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"I don't want this in the plan"</span><span class="phrase-count">10</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"feature branch"</span><span class="phrase-count">8+</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"confidence"</span><span class="phrase-count">8+</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"Missing overview"</span><span class="phrase-count">14</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"front-end design skill"</span><span class="phrase-count">16</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"separation of concerns"</span><span class="phrase-count">6</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"Take a step back, breathe"</span><span class="phrase-count">6</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"how does this work"</span><span class="phrase-count">5+</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"what the fuck"</span><span class="phrase-count">4</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"create a ticket"</span><span class="phrase-count">4+</span></div>
+      <div class="phrase-chip"><span class="phrase-text">"reusable"</span><span class="phrase-count">19+</span></div>
+    </div>
+  </div>
+
+  <!-- 7. Corrective Prompt -->
+  <div class="section" style="margin-bottom: 64px;">
+    <div class="action-panel">
+      <div class="section-label">7. The corrective prompt</div>
+      <div class="ap-intro">
+        These 17 instructions were extracted directly from your denial patterns. Embedding them in a planning prompt would address approximately 70% of all denial reasons.
+      </div>
+      <div class="prompt-block">
+        <div class="prompt-header">
+          <span class="prompt-header-label">
+            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="4 17 10 11 4 5"></polyline><line x1="12" y1="19" x2="20" y2="19"></line></svg>
+            planning-instructions.md
+          </span>
+          <button class="copy-btn" onclick="copyPrompt(this)">
+            <svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg>
+            Copy
+          </button>
+        </div>
+        <div class="prompt-body">
+          <pre id="prompt-content"><span class="comment"># Planning Instructions
+# Derived from 370 files of denial & annotation analysis</span>
+
+1. STRUCTURE: Every plan MUST begin with a "Solution Overview"
+   containing 2-3 paragraphs of narrative prose explaining:
+   - What exists today (current state)
+   - What will change and why
+   - How it will be built (approach summary)
+   Do NOT skip this. Do NOT replace it with bullet points.
+
+2. NARRATIVE: Every major section must include a rationale
+   paragraph — not just what will be done, but WHY this
+   approach was chosen over alternatives.
+
+3. FEATURE BRANCH: Always specify implementation will occur
+   on a feature branch. State the branch name. Never plan
+   to work directly on main.
+
+4. EXISTING PATTERNS: Before proposing any new implementation,
+   search the codebase for existing patterns that solve the
+   same problem. Explicitly state what you found and whether
+   you will reuse it. Prefer reuse over new code.
+
+5. CONFIDENCE STATEMENT: End the plan with a "Confidence
+   Assessment" section. State your confidence level, identify
+   risks or edge cases, and note uncertainties. Do not wait
+   to be asked.
+
+6. PHASING: For plans with more than 3 steps, break them into
+   numbered phases. After each phase, note "Pause for
+   evaluation" so the reviewer can assess before proceeding.
+
+7. ISSUE TRACKING: If the project uses Linear or GitHub Issues,
+   include a step to create relevant tickets BEFORE
+   implementation. Backlog items should be separate tickets.
+
+8. SIMPLICITY: Choose the simplest approach that meets
+   requirements. Do not introduce modifier keys when hover
+   works. Do not build a framework when a README suffices.
+
+9. NAMING: Use explicit, transparent names for user-facing
+   features. Do not euphemize Git operations ("Git Add"
+   not "Accept"). Match existing product naming conventions.
+
+10. CODE QUALITY: State that implementation will follow clean
+    code principles: modular architecture, separation of
+    concerns, no circumventing lint or type checks.
+
+11. CLEAN FOUNDATION: If the codebase has failing lint or type
+    checks, address these BEFORE proposing new features. State
+    the current CI/CD state.
+
+12. PRIVACY: For features involving data storage or sharing,
+    explicitly state privacy guarantees. Require user
+    confirmation before storing data.
+
+13. EXAMPLES: When the plan involves user-facing output or UI,
+    include an example of what it will look like.
+
+14. FOCUSED SCOPE: Do not include sections that are obvious,
+    boilerplate, or previously asked to be removed. Keep the
+    plan focused rather than comprehensive.
+
+15. DESIGN SKILL: For any frontend/UI work, invoke the
+    front-end design skill to validate the approach. Note
+    this invocation explicitly in the plan.
+
+16. VERIFICATION STEP: For refactors or multi-file changes,
+    include a verification step with line-by-line comparison
+    of affected code paths.
+
+17. DELIBERATION: If the plan involves a dramatic shift, state
+    that you have re-evaluated the approach, traced through
+    affected files mentally, and are confident in the plan.
+    Do not rush.</pre>
+        </div>
+      </div>
+
+      <div class="glow-wrap">
+        <div class="glow-bg"></div>
+        <div class="glow-card">
+          <div class="gc-text">
+            These instructions are yours &mdash; derived from <em>your feedback, your language, your standards</em>. Copy them into your planning prompt and watch the deny rate drop.
+          </div>
+        </div>
+      </div>
+    </div>
+  </div>
+
+  <footer>
+    <p>Analysis of 202 denied plans and 168 annotation files from the Plannotator archive.</p>
+  </footer>
+
+</div>
+
+<script>
+function copyPrompt(btn) {
+  const text = document.getElementById('prompt-content').textContent;
+  navigator.clipboard.writeText(text).then(() => {
+    btn.innerHTML = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M22 11.08V12a10 10 0 1 1-5.93-9.14"></path><polyline points="22 4 12 14.01 9 11.01"></polyline></svg> Copied';
+    btn.classList.add('copied');
+    setTimeout(() => {
+      btn.innerHTML = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg> Copy';
+      btn.classList.remove('copied');
+    }, 2000);
+  });
+}
+</script>
+</body>
+</html>
--- a/extensions/plannotator/skills/plannotator-compound/references/claude-code-fallback.md
+++ b/extensions/plannotator/skills/plannotator-compound/references/claude-code-fallback.md
@@ -0,0 +1,282 @@
+# Claude Code Fallback
+
+Read this file only when the user does **not** have a usable Plannotator archive.
+
+This is the secondary path for ordinary Claude Code users whose denial history
+exists in `~/.claude/projects/` rather than `~/.plannotator/plans/`.
+
+The goal is the same as the main skill:
+
+- extract the user's real denial reasons
+- reduce them into a taxonomy and prompt corrections
+- produce the same HTML report design and section flow
+
+## Source of Truth
+
+Use the bundled parser at:
+
+- [scripts/extract_exit_plan_mode_outcomes.py](../scripts/extract_exit_plan_mode_outcomes.py)
+
+Resolve that script path relative to this skill directory before running it.
+
+This script normalizes `ExitPlanMode` outcomes from Claude Code JSONL transcripts
+and emits clean JSON parts containing only human-authored denial reasons by default.
+
+Do **not** read raw `~/.claude/projects/**/*.jsonl` directly unless:
+
+- the parser fails
+- the user asks for audit-level verification
+- you need to inspect one or two suspicious records by hand
+
+The parser exists specifically to strip transcript noise such as generic native
+reject strings and wrapper boilerplate.
+
+## Run the Parser
+
+Create the working directory first:
+
+```bash
+mkdir -p /tmp/compound-planning
+```
+
+Then run the bundled parser. Prefer `python3`; if unavailable, use `python`.
+
+Use a resolved absolute script path, not a repo-local copy.
+
+```bash
+python3 [RESOLVED SKILL PATH]/scripts/extract_exit_plan_mode_outcomes.py \
+  --projects-dir ~/.claude/projects \
+  --json-out /tmp/compound-planning/claude-code-human-reasons.json \
+  --show-samples 0
+```
+
+Expected output:
+
+- manifest:
+  `/tmp/compound-planning/claude-code-human-reasons/claude-code-human-reasons.manifest.json`
+- part files:
+  `/tmp/compound-planning/claude-code-human-reasons/claude-code-human-reasons.part-XXXX-of-XXXX.json`
+
+The script prints how many records were detected and how many JSON part files were emitted.
+
+## What To Read First
+
+Read the manifest before reading any part file.
+
+The manifest gives you:
+
+- total filtered record count
+- total `ExitPlanMode` attempts
+- native approval / denial counts
+- non-native denial counts
+- part file list
+
+Use the part files only after you understand the overall dataset shape.
+
+## Inventory In Fallback Mode
+
+In Claude Code fallback mode, report this dataset instead of the Plannotator file counts:
+
+- human denial reasons found
+- total `ExitPlanMode` attempts scanned
+- native approvals
+- native denials with extractable inline reason
+- native denials without recoverable reason
+- non-native denials with recoverable payload
+- number of emitted JSON parts
+- date range from the records
+- total days spanned
+- distinct sessions
+- distinct project roots / `cwd` values
+
+Also calculate:
+
+- average `plan_length_chars` where present
+- percentage of all denials that contain a recoverable human reason
+
+Do **not** fabricate Plannotator-only inventory fields in fallback mode:
+
+- no `*-approved.md` counts
+- no `*.annotations.md` counts
+- no `*.diff.md` counts
+- no approved-plan line-count analysis
+
+If the user asks for those specifically, state that Claude Code log fallback mode
+does not contain those artifacts.
+
+### Previous Report Detection In Fallback Mode
+
+Previous report detection still applies. Check the user's home directory or
+`~/.plannotator/plans/` for existing `compound-planning-report*.html` files. If
+found, offer the same incremental vs full choice as Plannotator mode. In
+incremental mode, filter the parser output by timestamp rather than by filename
+date — use the `timestamp` field in each JSON record.
+
+If no previous report exists, use the first-report naming convention
+(`compound-planning-report.html`). Otherwise use the next version number.
+
+## Extraction In Fallback Mode
+
+Treat the emitted JSON part files as the clean source dataset.
+
+### Batching
+
+- **Small datasets (< 200 records):** read the part files directly without extra agents
+- **Medium datasets (200-800 records):** split by part file or time range into 2-4 agents
+- **Large datasets (800+ records):** split by part file groups or balanced time ranges
+
+All extraction agents should use `model: "haiku"` — they're doing straightforward
+file reading and structured extraction, not reasoning.
+
+Each extraction agent should read every record in its assigned part files and write
+clean markdown output to:
+
+```text
+/tmp/compound-planning/extraction-{batch-name}.md
+```
+
+### Extraction Prompt For Claude Code Denial Records
+
+Use this prompt for each fallback extraction batch (adapt the part files and output path):
+
+```text
+You are extracting structured data from Claude Code ExitPlanMode denial records.
+
+Files to read: [JSON PART FILES]
+Output: Write your complete results to [OUTPUT FILE PATH]
+
+Read EVERY record in the assigned files. Each record already contains a cleaned
+human_reason field. Use that as the primary source text.
+
+For EACH record, extract:
+- Date
+- Session ID
+- Project / cwd
+- Topic (only if inferable from the reason or plan path; otherwise say "Unknown from logs")
+- Human denial reason
+- What was specifically asked to change
+- Feedback type (let the content determine the category)
+- Notable phrases
+- Reason source (`native_inline_reason`, `non_native_freeform_payload`, or `structured_quote_extraction`)
+- Plan path if present
+- Plan length in chars if present
+
+Do NOT skip any records. One entry per record.
+
+Format each entry as:
+**[session_id :: tool_use_id]**
+- Date: ...
+- Project: ...
+- Topic: ...
+- Human denial reason: ...
+- Feedback type: ...
+- Specific asks: ...
+- Notable phrases: ...
+- Reason source: ...
+- Plan path: ...
+- Plan length chars: ...
+---
+
+After processing all records, write the complete results to [OUTPUT FILE PATH].
+State the total record count at the end of the file.
+```
+
+## Reduction In Fallback Mode
+
+The reduction step stays conceptually the same:
+
+- taxonomy
+- top patterns
+- recurring phrases
+- reviewer values
+- recurring agent mistakes
+- structural requests
+- evolution over time
+- corrective prompt instructions
+
+Use `model: "sonnet"` for reduction agents, same as Plannotator mode. The
+two-stage reduce (partial reduces for 21+ extraction files) also applies when
+there are many part files.
+
+But interpret the dataset correctly:
+
+- this is denial-reason evidence from Claude Code logs
+- not every denial has a recoverable human reason
+- annotations may be absent entirely
+- success traits are often inferred from the inverse of repeated denial feedback
+
+If the evidence for "what works" is weaker than the evidence for "what fails",
+say that explicitly.
+
+## HTML Report Adaptation
+
+Use the same template and the same section order as the main skill.
+
+In fallback mode:
+
+- explicitly state in the header/meta that the source is Claude Code `ExitPlanMode`
+  denial reasons
+- keep the same narrative-first editorial style
+- keep the same 7 major sections
+- use real denial-reason counts, dates, phrases, and percentages only
+
+### KPI Sidebar Substitutes
+
+The Plannotator version uses a revision-rate KPI that may not exist here.
+
+In fallback mode, prefer this KPI trio:
+
+1. top denial category percentage
+2. total human denial reasons recovered
+3. number of distinct denial categories
+
+If a better third metric emerges from the data, use it, but do not invent one.
+
+### Footer / Provenance
+
+The footer tagline should mention that the report was derived from Claude Code
+denial reasons rather than Plannotator markdown archives.
+
+### Important Limitation To State
+
+If `human_reasons_total < total denials`, mention in the narrative or footer note
+that some denials in the transcript did not contain recoverable human-authored
+feedback and therefore could not contribute to the pattern analysis.
+
+### Versioned Report Naming
+
+Versioned naming (`v2`, `v3`, etc.) applies to fallback mode too. Save reports
+to `~/.plannotator/plans/` (create the directory if it doesn't exist) so that
+all compound planning reports live in the same location regardless of data source.
+
+## Summary In Fallback Mode
+
+At the end, tell the user:
+
+- how many human denial reasons were analyzed
+- how many total `ExitPlanMode` attempts were scanned
+- the top 3 denial patterns found
+- the estimated percentage of denial reasons the corrective instructions address
+- the single most impactful prompt improvement
+- where the report was saved (including version number)
+- if incremental: note that earlier findings are in the previous report
+
+## Improvement Hook In Fallback Mode
+
+The Phase 6 improvement hook applies to fallback mode too. The corrective prompt
+instructions derived from Claude Code denial reasons are just as useful for
+injection into future planning sessions. Follow the same flow as the main skill.
+
+## Audit Mode
+
+Only if the user asks for raw denial records or transcript noise:
+
+```bash
+python3 [RESOLVED SKILL PATH]/scripts/extract_exit_plan_mode_outcomes.py \
+  --projects-dir ~/.claude/projects \
+  --records-filter denials \
+  --json-out /tmp/compound-planning/claude-code-all-denials.json \
+  --show-samples 0
+```
+
+Do not use this audit-mode output for the normal report unless the user asks for it.
--- a/extensions/plannotator/skills/plannotator-compound/scripts/extract_exit_plan_mode_outcomes.py
+++ b/extensions/plannotator/skills/plannotator-compound/scripts/extract_exit_plan_mode_outcomes.py
@@ -0,0 +1,820 @@
+#!/usr/bin/env python3
+"""Extract ExitPlanMode outcomes from Claude Code JSONL session logs.
+
+This parser keeps three views of the same data:
+
+1. Strict native Claude Code classification
+   - native approval:
+     "User has approved your plan."
+   - native denial:
+     "The user doesn't want to proceed with this tool use. The tool use was rejected"
+
+2. General denial capture
+   - any matching ExitPlanMode tool_result with is_error=true and non-empty text
+     is captured as a denial/error payload, even when it is custom hook output
+     or some other non-native integration.
+
+3. Human-reason extraction
+   - native inline reasons are preserved as-is
+   - freeform non-native error payloads are treated as human reasons
+   - structured non-native payloads are reduced to quoted feedback where possible
+
+This means the script does not depend on hook-specific strings to capture custom
+denials, but it also does not dump wrapper boilerplate into the human-reason
+output.
+
+The script streams JSONL line-by-line and uses only the Python standard library.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+from dataclasses import asdict, dataclass
+from pathlib import Path
+from typing import Dict, Iterable, Iterator, List, Optional, Tuple
+
+
+APPROVE_PREFIX = "User has approved your plan."
+REJECT_PREFIX = (
+    "The user doesn't want to proceed with this tool use. "
+    "The tool use was rejected"
+)
+REASON_MARKER = "To tell you how to proceed, the user said:\n"
+NOTE_MARKER = (
+    "\n\nNote: The user's next message may contain a correction or preference."
+)
+
+
+@dataclass
+class AttemptRecord:
+    session_id: str
+    tool_use_id: str
+    file_path: str
+    line_number: int
+    timestamp: Optional[str]
+    cwd: Optional[str]
+    plan_file_path: Optional[str]
+    plan_length_chars: Optional[int]
+    outcome: str = "pending"
+    native_reason: Optional[str] = None
+    native_reason_style: Optional[str] = None
+    captured_reason: Optional[str] = None
+    captured_reason_style: Optional[str] = None
+    captured_reason_source: Optional[str] = None
+    human_reason: Optional[str] = None
+    human_reason_style: Optional[str] = None
+    human_reason_source: Optional[str] = None
+    result_is_error: Optional[bool] = None
+    result_file_path: Optional[str] = None
+    result_line_number: Optional[int] = None
+    result_timestamp: Optional[str] = None
+    result_preview: Optional[str] = None
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Extract ExitPlanMode approvals/denials from Claude Code logs."
+    )
+    parser.add_argument(
+        "--projects-dir",
+        default="~/.claude/projects",
+        help="Root Claude projects directory. Default: %(default)s",
+    )
+    parser.add_argument(
+        "--include-subagents",
+        action="store_true",
+        help="Include /subagents/ JSONL files. Default is to skip them.",
+    )
+    parser.add_argument(
+        "--records-filter",
+        choices=("all", "native", "native-denials", "denials", "human-reasons"),
+        default="human-reasons",
+        help=(
+            "Which records to write to JSON/CSV outputs. "
+            "Default: %(default)s"
+        ),
+    )
+    parser.add_argument(
+        "--include-non-native-denials",
+        action="store_true",
+        help=(
+            "Include non-native denial/error payloads in sample output. "
+            "Default sample output shows only native denials."
+        ),
+    )
+    parser.add_argument(
+        "--show-samples",
+        type=int,
+        default=5,
+        help="How many denial samples to print in the text summary.",
+    )
+    parser.add_argument(
+        "--json-out",
+        help="Optional path to write a JSON report.",
+    )
+    parser.add_argument(
+        "--max-output-tokens-per-file",
+        type=int,
+        default=50000,
+        help=(
+            "Approximate max token budget per JSON file when writing --json-out. "
+            "Default: %(default)s"
+        ),
+    )
+    return parser.parse_args()
+
+
+def iter_jsonl_files(root: Path, include_subagents: bool) -> Iterator[Path]:
+    for dirpath, dirnames, filenames in os.walk(root):
+        if not include_subagents and "subagents" in dirnames:
+            dirnames.remove("subagents")
+        dirnames.sort()
+        for filename in sorted(filenames):
+            if filename.endswith(".jsonl"):
+                yield Path(dirpath) / filename
+
+
+def make_attempt_key(session_id: str, tool_use_id: str) -> str:
+    return session_id + "::" + tool_use_id
+
+
+def preview(text: str, limit: int = 220) -> str:
+    compact = " ".join(text.split())
+    if len(compact) <= limit:
+        return compact
+    return compact[: limit - 3] + "..."
+
+
+def estimate_tokens(text: str) -> int:
+    # Rough enough for output chunking. We intentionally bias slightly high.
+    return max(1, (len(text) + 3) // 4)
+
+
+def iter_blocks(message_content: object) -> Iterator[dict]:
+    if not isinstance(message_content, list):
+        return
+    for block in message_content:
+        if isinstance(block, dict):
+            yield block
+
+
+def extract_text(content: object) -> str:
+    if isinstance(content, str):
+        return content
+    if not isinstance(content, list):
+        return ""
+
+    parts: List[str] = []
+    for item in content:
+        if isinstance(item, str):
+            parts.append(item)
+            continue
+        if not isinstance(item, dict):
+            continue
+        if isinstance(item.get("text"), str):
+            parts.append(item["text"])
+        elif isinstance(item.get("content"), str):
+            parts.append(item["content"])
+    return "\n".join(part for part in parts if part)
+
+
+def classify_reason_style(reason: Optional[str]) -> Optional[str]:
+    if not reason:
+        return None
+
+    stripped = reason.lstrip()
+    if (
+        stripped.startswith("#")
+        or stripped.startswith("YOUR PLAN WAS NOT APPROVED.")
+        or "\n## " in reason
+        or "\n---" in reason
+    ):
+        return "structured"
+    return "freeform"
+
+
+def extract_blockquote_feedback(text: str) -> List[str]:
+    quotes: List[str] = []
+    current: List[str] = []
+
+    for raw_line in text.splitlines():
+        stripped = raw_line.strip()
+        if stripped.startswith(">"):
+            current.append(stripped[1:].lstrip())
+            continue
+
+        if current:
+            if not stripped or stripped.startswith("## ") or stripped == "---":
+                quote = "\n".join(line for line in current if line).strip()
+                if quote:
+                    quotes.append(quote)
+                current = []
+                continue
+
+            # Preserve wrapped continuation lines that belong to the same quote.
+            current.append(stripped)
+
+    if current:
+        quote = "\n".join(line for line in current if line).strip()
+        if quote:
+            quotes.append(quote)
+
+    return quotes
+
+
+def extract_human_reason(
+    native_reason: Optional[str],
+    captured_reason: Optional[str],
+    captured_reason_style: Optional[str],
+) -> Tuple[Optional[str], Optional[str], Optional[str]]:
+    if native_reason:
+        return (
+            native_reason,
+            classify_reason_style(native_reason),
+            "native_inline_reason",
+        )
+
+    if not captured_reason:
+        return (None, None, None)
+
+    if captured_reason_style == "freeform":
+        return (
+            captured_reason,
+            classify_reason_style(captured_reason),
+            "non_native_freeform_payload",
+        )
+
+    quote_feedback = extract_blockquote_feedback(captured_reason)
+    if quote_feedback:
+        reason = "\n\n".join(quote_feedback)
+        return (
+            reason,
+            classify_reason_style(reason),
+            "structured_quote_extraction",
+        )
+
+    return (None, None, None)
+
+
+def classify_result(
+    text: str,
+    is_error: bool,
+) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
+    stripped = text.strip()
+    if not stripped:
+        if is_error:
+            return (
+                "denied_non_native_no_payload",
+                None,
+                None,
+                None,
+                None,
+            )
+        return ("pending", None, None, None, None)
+
+    if stripped.startswith(APPROVE_PREFIX):
+        return ("approved_native", None, None, None, None)
+
+    if stripped.startswith(REJECT_PREFIX):
+        marker_index = stripped.find(REASON_MARKER)
+        if marker_index < 0:
+            return ("denied_native_no_reason", None, None, None, None)
+
+        reason = stripped[marker_index + len(REASON_MARKER) :]
+        note_index = reason.find(NOTE_MARKER)
+        if note_index >= 0:
+            reason = reason[:note_index]
+        reason = reason.strip()
+        if reason:
+            style = classify_reason_style(reason)
+            return (
+                "denied_native_with_reason",
+                reason,
+                reason,
+                "native_inline_reason",
+                style,
+            )
+        return ("denied_native_no_reason", None, None, None, None)
+
+    if is_error:
+        style = classify_reason_style(stripped)
+        return (
+            "denied_non_native_with_payload",
+            None,
+            stripped,
+            "non_native_error_payload",
+            style,
+        )
+
+    return ("non_native_other", None, None, None, None)
+
+
+def outcome_rank(outcome: str) -> int:
+    ranks = {
+        "pending": 0,
+        "non_native_other": 1,
+        "approved_native": 2,
+        "denied_native_no_reason": 3,
+        "denied_native_with_reason": 4,
+        "denied_non_native_no_payload": 5,
+        "denied_non_native_with_payload": 6,
+    }
+    return ranks.get(outcome, 0)
+
+
+def update_attempt_from_result(
+    attempt: AttemptRecord,
+    file_path: Path,
+    line_number: int,
+    timestamp: Optional[str],
+    text: str,
+    is_error: bool,
+) -> None:
+    (
+        outcome,
+        native_reason,
+        captured_reason,
+        captured_reason_source,
+        captured_reason_style,
+    ) = classify_result(text=text, is_error=is_error)
+    if outcome_rank(outcome) < outcome_rank(attempt.outcome):
+        return
+
+    attempt.outcome = outcome
+    attempt.native_reason = native_reason
+    attempt.native_reason_style = classify_reason_style(native_reason)
+    attempt.captured_reason = captured_reason
+    attempt.captured_reason_source = captured_reason_source
+    attempt.captured_reason_style = captured_reason_style
+    (
+        attempt.human_reason,
+        attempt.human_reason_style,
+        attempt.human_reason_source,
+    ) = extract_human_reason(
+        native_reason=native_reason,
+        captured_reason=captured_reason,
+        captured_reason_style=captured_reason_style,
+    )
+    attempt.result_is_error = is_error
+    attempt.result_file_path = str(file_path)
+    attempt.result_line_number = line_number
+    attempt.result_timestamp = timestamp
+    attempt.result_preview = preview(text)
+
+
+def scan_projects(
+    projects_dir: Path,
+    include_subagents: bool,
+) -> Tuple[Dict[str, int], List[AttemptRecord]]:
+    stats = {
+        "files_scanned": 0,
+        "lines_scanned": 0,
+        "json_errors": 0,
+    }
+    attempts: Dict[str, AttemptRecord] = {}
+
+    for file_path in iter_jsonl_files(projects_dir, include_subagents):
+        stats["files_scanned"] += 1
+        try:
+            handle = file_path.open("r", encoding="utf-8", errors="replace")
+        except OSError:
+            continue
+
+        with handle:
+            for line_number, raw_line in enumerate(handle, start=1):
+                if not raw_line.strip():
+                    continue
+                stats["lines_scanned"] += 1
+                try:
+                    obj = json.loads(raw_line)
+                except json.JSONDecodeError:
+                    stats["json_errors"] += 1
+                    continue
+
+                session_id = str(obj.get("sessionId") or str(file_path))
+                timestamp = obj.get("timestamp")
+                cwd = obj.get("cwd")
+                message = obj.get("message")
+                if not isinstance(message, dict):
+                    continue
+
+                content = message.get("content")
+
+                for block in iter_blocks(content):
+                    if (
+                        block.get("type") == "tool_use"
+                        and block.get("name") == "ExitPlanMode"
+                        and isinstance(block.get("id"), str)
+                    ):
+                        tool_use_id = block["id"]
+                        key = make_attempt_key(session_id, tool_use_id)
+                        if key in attempts:
+                            continue
+                        input_data = block.get("input")
+                        plan = None
+                        plan_file_path = None
+                        if isinstance(input_data, dict):
+                            if isinstance(input_data.get("plan"), str):
+                                plan = input_data["plan"]
+                            if isinstance(input_data.get("planFilePath"), str):
+                                plan_file_path = input_data["planFilePath"]
+
+                        attempts[key] = AttemptRecord(
+                            session_id=session_id,
+                            tool_use_id=tool_use_id,
+                            file_path=str(file_path),
+                            line_number=line_number,
+                            timestamp=timestamp if isinstance(timestamp, str) else None,
+                            cwd=cwd if isinstance(cwd, str) else None,
+                            plan_file_path=plan_file_path,
+                            plan_length_chars=len(plan) if isinstance(plan, str) else None,
+                        )
+
+                if message.get("role") != "user":
+                    continue
+
+                for block in iter_blocks(content):
+                    if (
+                        block.get("type") != "tool_result"
+                        or not isinstance(block.get("tool_use_id"), str)
+                    ):
+                        continue
+
+                    key = make_attempt_key(session_id, block["tool_use_id"])
+                    attempt = attempts.get(key)
+                    if attempt is None:
+                        continue
+
+                    text = extract_text(block.get("content"))
+                    update_attempt_from_result(
+                        attempt=attempt,
+                        file_path=file_path,
+                        line_number=line_number,
+                        timestamp=timestamp if isinstance(timestamp, str) else None,
+                        text=text,
+                        is_error=bool(block.get("is_error")),
+                    )
+
+    return stats, list(attempts.values())
+
+
+def summarize(attempts: Iterable[AttemptRecord]) -> Dict[str, int]:
+    summary = {
+        "total_exit_plan_attempts": 0,
+        "approved_native": 0,
+        "denied_native_with_reason": 0,
+        "denied_native_no_reason": 0,
+        "denied_native_with_freeform_reason": 0,
+        "denied_native_with_structured_reason": 0,
+        "denied_non_native_with_payload": 0,
+        "denied_non_native_no_payload": 0,
+        "captured_denial_reasons_total": 0,
+        "captured_freeform_reasons": 0,
+        "captured_structured_reasons": 0,
+        "human_reasons_total": 0,
+        "human_reasons_native": 0,
+        "human_reasons_non_native": 0,
+        "human_reasons_freeform": 0,
+        "human_reasons_structured": 0,
+        "non_native_other": 0,
+        "pending": 0,
+    }
+    for attempt in attempts:
+        summary["total_exit_plan_attempts"] += 1
+        summary[attempt.outcome] = summary.get(attempt.outcome, 0) + 1
+        if attempt.outcome == "denied_native_with_reason":
+            if attempt.native_reason_style == "freeform":
+                summary["denied_native_with_freeform_reason"] += 1
+            elif attempt.native_reason_style == "structured":
+                summary["denied_native_with_structured_reason"] += 1
+        if attempt.captured_reason:
+            summary["captured_denial_reasons_total"] += 1
+            if attempt.captured_reason_style == "freeform":
+                summary["captured_freeform_reasons"] += 1
+            elif attempt.captured_reason_style == "structured":
+                summary["captured_structured_reasons"] += 1
+        if attempt.human_reason:
+            summary["human_reasons_total"] += 1
+            if attempt.human_reason_source == "native_inline_reason":
+                summary["human_reasons_native"] += 1
+            else:
+                summary["human_reasons_non_native"] += 1
+            if attempt.human_reason_style == "freeform":
+                summary["human_reasons_freeform"] += 1
+            elif attempt.human_reason_style == "structured":
+                summary["human_reasons_structured"] += 1
+    return summary
+
+
+def filter_records(
+    attempts: List[AttemptRecord],
+    records_filter: str,
+) -> List[AttemptRecord]:
+    if records_filter == "all":
+        return attempts
+    if records_filter == "native":
+        return [
+            attempt
+            for attempt in attempts
+            if attempt.outcome.startswith("approved_native")
+            or attempt.outcome.startswith("denied_native")
+        ]
+    if records_filter == "native-denials":
+        return [
+            attempt
+            for attempt in attempts
+            if attempt.outcome.startswith("denied_native")
+        ]
+    if records_filter == "human-reasons":
+        return [attempt for attempt in attempts if attempt.human_reason]
+    return [
+        attempt
+        for attempt in attempts
+        if attempt.outcome.startswith("denied_native")
+        or attempt.outcome.startswith("denied_non_native")
+    ]
+
+
+def build_json_chunks(
+    records: List[AttemptRecord],
+    max_output_tokens_per_file: int,
+) -> List[List[AttemptRecord]]:
+    if not records:
+        return [[]]
+
+    chunks: List[List[AttemptRecord]] = []
+    current_chunk: List[AttemptRecord] = []
+    current_tokens = 0
+
+    for record in records:
+        record_dict = asdict(record)
+        record_json = json.dumps(record_dict, ensure_ascii=False)
+        record_tokens = estimate_tokens(record_json)
+
+        if current_chunk and current_tokens + record_tokens > max_output_tokens_per_file:
+            chunks.append(current_chunk)
+            current_chunk = []
+            current_tokens = 0
+
+        current_chunk.append(record)
+        current_tokens += record_tokens
+
+    if current_chunk:
+        chunks.append(current_chunk)
+
+    return chunks
+
+
+def print_summary(
+    projects_dir: Path,
+    include_subagents: bool,
+    stats: Dict[str, int],
+    attempts: List[AttemptRecord],
+    summary: Dict[str, int],
+    show_samples: int,
+    include_non_native_denials: bool,
+) -> None:
+    native_denials = (
+        summary["denied_native_with_reason"] + summary["denied_native_no_reason"]
+    )
+    total_denials = (
+        native_denials
+        + summary["denied_non_native_with_payload"]
+        + summary["denied_non_native_no_payload"]
+    )
+    native_extractable_ratio = (
+        (summary["denied_native_with_reason"] / native_denials) * 100.0
+        if native_denials
+        else 0.0
+    )
+    all_capture_ratio = (
+        (summary["captured_denial_reasons_total"] / total_denials) * 100.0
+        if total_denials
+        else 0.0
+    )
+
+    print(f"Projects dir: {projects_dir}")
+    print(f"Included subagents: {'yes' if include_subagents else 'no'}")
+    print(f"JSONL files scanned: {stats['files_scanned']}")
+    print(f"JSON lines scanned: {stats['lines_scanned']}")
+    print(f"JSON parse errors: {stats['json_errors']}")
+    print()
+    print(f"ExitPlanMode attempts: {summary['total_exit_plan_attempts']}")
+    print(f"Native approvals: {summary['approved_native']}")
+    print(
+        "Native denials with extractable reason: "
+        f"{summary['denied_native_with_reason']}"
+    )
+    print(
+        "Native denials without reason: "
+        f"{summary['denied_native_no_reason']}"
+    )
+    print(
+        "Freeform native reasons: "
+        f"{summary['denied_native_with_freeform_reason']}"
+    )
+    print(
+        "Structured native reasons: "
+        f"{summary['denied_native_with_structured_reason']}"
+    )
+    print(
+        "Non-native denials with payload: "
+        f"{summary['denied_non_native_with_payload']}"
+    )
+    print(
+        "Non-native denials without payload: "
+        f"{summary['denied_non_native_no_payload']}"
+    )
+    print(
+        "Captured denial reasons total: "
+        f"{summary['captured_denial_reasons_total']}"
+    )
+    print(
+        "Captured freeform reasons: "
+        f"{summary['captured_freeform_reasons']}"
+    )
+    print(
+        "Captured structured reasons: "
+        f"{summary['captured_structured_reasons']}"
+    )
+    print(f"Human reasons total: {summary['human_reasons_total']}")
+    print(f"Human reasons from native denials: {summary['human_reasons_native']}")
+    print(
+        "Human reasons from non-native denials: "
+        f"{summary['human_reasons_non_native']}"
+    )
+    print(
+        "Non-native / non-denial outcomes: "
+        f"{summary['non_native_other']}"
+    )
+    print(f"Pending / unmatched attempts: {summary['pending']}")
+    print()
+    print(
+        "Extractable native denial reasons: "
+        f"{summary['denied_native_with_reason']}/{native_denials} "
+        f"({native_extractable_ratio:.1f}%)"
+    )
+    print(
+        "Captured denial payloads across all denial types: "
+        f"{summary['captured_denial_reasons_total']}/{total_denials} "
+        f"({all_capture_ratio:.1f}%)"
+    )
+    print(
+        "Human reasons across all denial types: "
+        f"{summary['human_reasons_total']}/{total_denials} "
+        f"({((summary['human_reasons_total'] / total_denials) * 100.0 if total_denials else 0.0):.1f}%)"
+    )
+
+    if include_non_native_denials:
+        samples = [attempt for attempt in attempts if attempt.human_reason]
+    else:
+        samples = [
+            attempt
+            for attempt in attempts
+            if attempt.outcome == "denied_native_with_reason" and attempt.human_reason
+        ]
+    samples = samples[: max(show_samples, 0)]
+    if not samples:
+        return
+
+    print()
+    print(
+        "Sample denial reasons:"
+        if include_non_native_denials
+        else "Sample native denial reasons:"
+    )
+    for attempt in samples:
+        style = attempt.human_reason_style or "unknown"
+        source = attempt.human_reason_source or "unknown"
+        reason = attempt.human_reason or ""
+        print(
+            "- "
+            f"[{attempt.outcome} / {source} / {style}] "
+            f"{reason!r} "
+            f"({attempt.file_path}:{attempt.result_line_number})"
+        )
+
+
+def write_json_report(
+    output_path: Path,
+    projects_dir: Path,
+    include_subagents: bool,
+    stats: Dict[str, int],
+    summary: Dict[str, int],
+    records: List[AttemptRecord],
+    max_output_tokens_per_file: int,
+) -> List[Path]:
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+
+    chunks = build_json_chunks(records, max_output_tokens_per_file)
+    base_name = output_path.stem
+    output_dir = output_path.with_suffix("")
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    written_files: List[Path] = []
+    part_summaries = []
+
+    for index, chunk in enumerate(chunks, start=1):
+        chunk_records = [asdict(record) for record in chunk]
+        chunk_payload = {
+            "projects_dir": str(projects_dir),
+            "include_subagents": include_subagents,
+            "stats": stats,
+            "summary": summary,
+            "part_index": index,
+            "part_count": len(chunks),
+            "record_count": len(chunk_records),
+            "records": chunk_records,
+        }
+        part_name = f"{base_name}.part-{index:04d}-of-{len(chunks):04d}.json"
+        part_path = output_dir / part_name
+        part_path.write_text(
+            json.dumps(chunk_payload, indent=2, ensure_ascii=False),
+            encoding="utf-8",
+        )
+        written_files.append(part_path)
+        part_summaries.append(
+            {
+                "part_index": index,
+                "file_name": part_name,
+                "record_count": len(chunk_records),
+            }
+        )
+
+    manifest_payload = {
+        "projects_dir": str(projects_dir),
+        "include_subagents": include_subagents,
+        "stats": stats,
+        "summary": summary,
+        "records_filter_record_count": len(records),
+        "part_count": len(chunks),
+        "max_output_tokens_per_file": max_output_tokens_per_file,
+        "parts": part_summaries,
+    }
+    manifest_path = output_dir / f"{base_name}.manifest.json"
+    manifest_path.write_text(
+        json.dumps(manifest_payload, indent=2, ensure_ascii=False),
+        encoding="utf-8",
+    )
+    written_files.insert(0, manifest_path)
+
+    return written_files
+
+
+def main() -> int:
+    args = parse_args()
+    projects_dir = Path(args.projects_dir).expanduser()
+    if not projects_dir.exists():
+        print(f"Projects dir does not exist: {projects_dir}", file=sys.stderr)
+        return 1
+
+    stats, attempts = scan_projects(
+        projects_dir=projects_dir,
+        include_subagents=args.include_subagents,
+    )
+    attempts.sort(
+        key=lambda attempt: (
+            attempt.file_path,
+            attempt.line_number,
+            attempt.tool_use_id,
+        )
+    )
+    summary = summarize(attempts)
+    records = filter_records(attempts, args.records_filter)
+
+    print_summary(
+        projects_dir=projects_dir,
+        include_subagents=args.include_subagents,
+        stats=stats,
+        attempts=attempts,
+        summary=summary,
+        show_samples=args.show_samples,
+        include_non_native_denials=args.include_non_native_denials,
+    )
+
+    if args.json_out:
+        written_files = write_json_report(
+            output_path=Path(args.json_out).expanduser(),
+            projects_dir=projects_dir,
+            include_subagents=args.include_subagents,
+            stats=stats,
+            summary=summary,
+            records=records,
+            max_output_tokens_per_file=args.max_output_tokens_per_file,
+        )
+        part_count = max(len(written_files) - 1, 0)
+        print()
+        print(
+            "Wrote JSON output: "
+            f"detected {len(records)} records for filter '{args.records_filter}' "
+            f"and emitted {part_count} part file(s) plus a manifest."
+        )
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/extensions/plannotator/skills/plannotator-last/SKILL.md
+++ b/extensions/plannotator/skills/plannotator-last/SKILL.md
@@ -0,0 +1,23 @@
+---
+name: plannotator-last
+description: Open Plannotator on the latest rendered assistant message and use the returned annotations to revise that message or continue.
+---
+
+# Plannotator Last
+
+Use this skill when the user wants to annotate the latest assistant response in Plannotator.
+
+Run:
+
+```bash
+plannotator last
+```
+
+Behavior:
+
+1. Launch the command with Bash.
+2. Wait for the annotation session to finish.
+3. If feedback is returned, incorporate it into the follow-up response.
+4. If the session closes without feedback, mention that briefly and continue.
+
+Run the command yourself rather than telling the user to invoke shell syntax manually.
--- a/extensions/plannotator/skills/plannotator-last/agents/openai.yaml
+++ b/extensions/plannotator/skills/plannotator-last/agents/openai.yaml
@@ -0,0 +1,5 @@
+interface:
+  display_name: "Plannotator Last"
+  short_description: "Annotate the latest assistant message in Plannotator."
+policy:
+  allow_implicit_invocation: false
--- a/extensions/plannotator/skills/plannotator-review/SKILL.md
+++ b/extensions/plannotator/skills/plannotator-review/SKILL.md
@@ -0,0 +1,23 @@
+---
+name: plannotator-review
+description: Open Plannotator's browser-based code review UI for the current worktree or a pull request URL, then act on the feedback that comes back.
+---
+
+# Plannotator Review
+
+Use this skill when the user wants to review current code changes in Plannotator instead of reading a diff inline.
+
+Run:
+
+```bash
+plannotator review [optional-pr-url]
+```
+
+Behavior:
+
+1. Launch the command with Bash.
+2. Wait for it to finish.
+3. If it returns feedback or annotations, address them in the same conversation.
+4. If it returns an approval/LGTM-style message, acknowledge that review passed and continue.
+
+Do not ask the user to copy shell commands into chat. Run the command yourself.
--- a/extensions/plannotator/skills/plannotator-review/agents/openai.yaml
+++ b/extensions/plannotator/skills/plannotator-review/agents/openai.yaml
@@ -0,0 +1,5 @@
+interface:
+  display_name: "Plannotator Review"
+  short_description: "Open Plannotator code review for local changes or a PR."
+policy:
+  allow_implicit_invocation: false
--- a/extensions/plannotator/skills/plannotator-setup-goal/SKILL.md
+++ b/extensions/plannotator/skills/plannotator-setup-goal/SKILL.md
@@ -0,0 +1,89 @@
+---
+name: plannotator-setup-goal
+description: Create reviewed Codex goal setup packages for long-running /goal work. Use when the user wants to turn an idea, backlog, project mission, or vague objective into durable goal files under a project goals slug folder, with Plannotator review gates for brief, narrative plan with acceptance criteria, verification, blockers, and the final /goal prompt.
+---
+
+# Plannotator Setup Goal
+
+## Overview
+
+Create a durable goal package in the current project at `goals/<slug>/` so Codex `/goal` has a clear mission, guardrails, proof of done, and external memory. Use Plannotator as the user review UI: every critical document must be gated with `plannotator annotate <document.md> --gate` and revised until approved.
+
+## Workflow
+
+1. Confirm the working directory is the project root, or use the user-provided project directory.
+2. Gather enough context to name the goal, define the intended outcome, identify constraints, find likely project docs, and determine proof of done.
+3. Ask focused questions whenever the goal is vague, risky, too broad, missing a finish line, or missing verification. Do not proceed with guessed critical requirements.
+4. Create a slug from the goal name and scaffold `goals/<slug>/` with:
+
+   ```bash
+   python3 <skill_dir>/scripts/scaffold_goal.py --root . --slug <slug> --title "<goal title>" --objective "<one sentence outcome>"
+   ```
+
+5. Draft and refine the critical documents in this order:
+   - `brief.md`
+   - `plan.md`
+   - `verification.md`
+   - `blockers.md`
+   - `goal-prompt.md`
+6. Gate each critical document with Plannotator before moving on:
+
+   ```bash
+   plannotator annotate goals/<slug>/<document.md> --gate
+   ```
+
+7. If Plannotator returns denial, comments, or markup, treat that as user feedback. Revise the document, then run the same gate again. Continue until approved.
+8. After all gates pass, present the final path and the exact `/goal` prompt from `goal-prompt.md`.
+
+## Document Standards
+
+`brief.md` must state the mission, context, constraints, non-goals, ask-before rules, and concise done condition.
+
+`plan.md` is the central reviewed planning artifact. It must read like a clear solution narrative, not just a technical checklist. Include what is being built, why this approach is appropriate, how the solution will work, the main implementation slices, risks, phase boundaries, and acceptance criteria. Every important acceptance item needs observable evidence. For large missions, prefer several sequential goals over one endless goal.
+
+`verification.md` must list exact verification commands and manual checks. Include expected pass conditions and where evidence should be recorded.
+
+`blockers.md` must capture open questions, user-decision points, dangerous operations that require approval, and conditions that should pause the goal.
+
+`goal-prompt.md` must contain the final command the user can paste into Codex. It should reference the goal package files as the durable source of truth, tell Codex to append evidence to `progress.jsonl`, and define when to stop or ask.
+
+`progress.jsonl` is append-only evidence. Do not gate it. During execution, append concrete progress and proof, not summaries of intent.
+
+## Plannotator Rules
+
+Use Plannotator as the review surface, not as a passive preview. The command `plannotator annotate <document.md> --gate` presents the document to the user and captures approval or denial feedback.
+
+Do not skip gates for critical documents. Do not mark a document ready because it seems reasonable. The user must approve it through the gate.
+
+If a document is denied, update the document from the captured feedback and rerun the gate. Keep the loop tight: one document, one review, one revision cycle.
+
+## Goal Prompt Rules
+
+Write the final `/goal` prompt as a compact product brief, not a raw todo dump.
+
+Include:
+- outcome
+- relevant files
+- constraints and non-goals
+- plan acceptance criteria and evidence
+- verification commands
+- ask-before rules
+- instruction to use `goals/<slug>/` as the durable plan and append evidence to `progress.jsonl`
+
+Avoid:
+- open-ended improvement loops
+- mixed unrelated missions
+- vague words like "improve" without measurable proof
+- instructions to keep working forever
+- hidden assumptions that are not written into the files
+
+## Quality Checks
+
+Before finalizing, verify:
+- The goal has one clear finish line.
+- The plan explains what, why, and how before listing work slices.
+- The plan acceptance criteria can be audited from real artifacts.
+- Verification commands are concrete.
+- Risky actions have ask-before rules.
+- The final `/goal` prompt tells Codex where the goal files live.
+- All critical documents have passed Plannotator gates.
--- a/extensions/plannotator/skills/plannotator-setup-goal/agents/openai.yaml
+++ b/extensions/plannotator/skills/plannotator-setup-goal/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Plannotator Goal Setup"
+  short_description: "Build reviewed Codex goal packages"
+  default_prompt: "Use $plannotator-setup-goal to create a reviewed goal package for this project."
--- a/extensions/plannotator/skills/plannotator-setup-goal/scripts/scaffold_goal.py
+++ b/extensions/plannotator/skills/plannotator-setup-goal/scripts/scaffold_goal.py
@@ -0,0 +1,219 @@
+#!/usr/bin/env python3
+"""Scaffold a reviewed Codex goal package under goals/<slug>/."""
+
+from __future__ import annotations
+
+import argparse
+import datetime as dt
+import json
+import re
+import sys
+from pathlib import Path
+
+
+def slugify(value: str) -> str:
+    slug = re.sub(r"[^a-z0-9]+", "-", value.strip().lower()).strip("-")
+    slug = re.sub(r"-{2,}", "-", slug)
+    return slug or "goal"
+
+
+def write_file(path: Path, content: str, force: bool) -> None:
+    if path.exists() and not force:
+        return
+    path.write_text(content, encoding="utf-8")
+
+
+def brief(title: str, objective: str) -> str:
+    return f"""# {title}
+
+## Outcome
+
+{objective or "TODO: State the concrete outcome in one or two sentences."}
+
+## Context
+
+- TODO: List the project facts, files, docs, user needs, and constraints Codex must know.
+
+## Constraints
+
+- TODO: List behavior, APIs, data, UX, performance, compatibility, or process rules that must not regress.
+
+## Non-Goals
+
+- TODO: List work that is out of scope for this goal.
+
+## Ask Before
+
+- TODO: List decisions, risky operations, external dependencies, product calls, and destructive changes that require user approval.
+
+## Done Means
+
+- TODO: Summarize the finish line. Detailed acceptance evidence belongs in `acceptance.md`.
+"""
+
+
+def plan(title: str) -> str:
+    return f"""# Plan: {title}
+
+## Solution Overview
+
+TODO: Describe what is being built in plain language. Explain the shape of the solution before diving into tasks.
+
+## Why This Approach
+
+TODO: Explain why this direction is appropriate for the project, user goal, constraints, and risk level.
+
+## How It Will Work
+
+TODO: Describe the main moving parts, data flow, user flow, files, APIs, or systems involved. Keep this narrative enough that a reviewer can understand the intended solution.
+
+## Slices
+
+| Slice | Purpose | Main files or systems | Done when | Risks |
+| --- | --- | --- | --- | --- |
+| 1 | TODO | TODO | TODO | TODO |
+| 2 | TODO | TODO | TODO | TODO |
+
+## Sequencing
+
+- TODO: Explain the order of execution and which slices block later slices.
+
+## Phase Boundaries
+
+- TODO: State when this goal should end and a new goal should be created instead of stretching this one.
+
+## Steering Notes
+
+- TODO: Capture taste calls, product preferences, or review checkpoints the user should steer during execution.
+
+## Acceptance Criteria
+
+- [ ] TODO: Requirement with concrete observable evidence.
+- [ ] TODO: Requirement with concrete observable evidence.
+
+## Required Evidence
+
+| Requirement | Evidence to inspect | Where evidence is recorded |
+| --- | --- | --- |
+| TODO | TODO | TODO |
+
+## Completion Audit
+
+Before marking the goal complete, Codex must map every explicit requirement, file, command, check, and deliverable to real evidence. If any item is missing, incomplete, weakly verified, or uncertain, the goal is not complete.
+"""
+
+
+def verification(title: str) -> str:
+    return f"""# Verification: {title}
+
+## Commands
+
+| Command | Purpose | Expected pass condition | Evidence location |
+| --- | --- | --- | --- |
+| TODO | TODO | TODO | TODO |
+
+## Manual Checks
+
+- TODO: Add browser checks, screenshots, release checks, PR checks, or human review steps.
+
+## Evidence Rules
+
+- Record verification results in `progress.jsonl`.
+- Include command, status, timestamp, and artifact path when available.
+- Do not rely on passing tests unless they cover the requirement being claimed.
+"""
+
+
+def blockers(title: str) -> str:
+    return f"""# Blockers: {title}
+
+## Open Questions
+
+- TODO: Questions that must be answered before or during execution.
+
+## Stop And Ask
+
+- TODO: Conditions that should pause the goal and ask the user.
+
+## Dangerous Or High-Risk Actions
+
+- TODO: Destructive changes, migrations, dependency changes, security-sensitive work, billing/auth changes, or external operations requiring approval.
+
+## Known Blockers
+
+- TODO: Current blockers, owners, and next action.
+"""
+
+
+def goal_prompt(slug: str, title: str, objective: str) -> str:
+    prompt_objective = objective or f"Complete the reviewed goal package for {title}."
+    return f"""# Codex Goal Prompt: {title}
+
+After every critical document in this folder is approved with Plannotator, paste or set this goal:
+
+```text
+/goal {prompt_objective}
+
+Use `goals/{slug}/` as the durable source of truth:
+- Read `brief.md` for the mission, context, constraints, non-goals, and ask-before rules.
+- Follow `plan.md` for the solution overview, implementation slices, risks, and acceptance criteria.
+- Run the checks in `verification.md` and record evidence.
+- Append concrete progress and proof to `progress.jsonl`.
+- Pause and ask the user for anything listed in `blockers.md` or any similarly risky unresolved decision.
+
+Do not mark the goal complete until every acceptance item is backed by real evidence and the required verification has passed or the remaining blocker is explicitly documented for the user.
+```
+"""
+
+
+def progress_entry(title: str, objective: str) -> str:
+    now = dt.datetime.now(dt.timezone.utc).replace(microsecond=0).isoformat()
+    entry = {
+        "type": "goal_package_created",
+        "timestamp": now,
+        "title": title,
+        "objective": objective,
+        "evidence": "Initial scaffold created; critical documents still require Plannotator gate approval.",
+    }
+    return json.dumps(entry, ensure_ascii=True) + "\n"
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--root", default=".", help="Project root where goals/ should be created.")
+    parser.add_argument("--slug", help="Goal folder name. Defaults to a slugified title.")
+    parser.add_argument("--title", required=True, help="Human-readable goal title.")
+    parser.add_argument("--objective", default="", help="One-sentence goal outcome.")
+    parser.add_argument("--force", action="store_true", help="Overwrite existing scaffold files.")
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+    root = Path(args.root).resolve()
+    slug = slugify(args.slug or args.title)
+    goal_dir = root / "goals" / slug
+    goal_dir.mkdir(parents=True, exist_ok=True)
+
+    files = {
+        "brief.md": brief(args.title, args.objective),
+        "plan.md": plan(args.title),
+        "verification.md": verification(args.title),
+        "blockers.md": blockers(args.title),
+        "goal-prompt.md": goal_prompt(slug, args.title, args.objective),
+    }
+    for name, content in files.items():
+        write_file(goal_dir / name, content, args.force)
+
+    progress_path = goal_dir / "progress.jsonl"
+    if not progress_path.exists() or args.force:
+        write_file(progress_path, progress_entry(args.title, args.objective), args.force)
+
+    print(goal_dir)
+    for name in sorted([*files.keys(), "progress.jsonl"]):
+        print(goal_dir / name)
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())