Add plannotator extension v0.19.10
This commit is contained in:
574
extensions/plannotator/skills/plannotator-compound/SKILL.md
Normal file
574
extensions/plannotator/skills/plannotator-compound/SKILL.md
Normal file
@@ -0,0 +1,574 @@
|
||||
---
|
||||
name: plannotator-compound
|
||||
disable-model-invocation: true
|
||||
description: >
|
||||
Analyze a user's Plannotator plan archive to extract denial patterns, feedback
|
||||
taxonomy, evolution over time, and actionable prompt improvements — then produce
|
||||
a polished HTML dashboard report. Falls back to Claude Code ExitPlanMode denial
|
||||
reasons when Plannotator data is unavailable.
|
||||
---
|
||||
|
||||
# Compound Planning Analysis
|
||||
|
||||
You are conducting a comprehensive research analysis of a user's Plannotator plan
|
||||
archive. The goal: extract patterns from their denied plans, reduce
|
||||
them into actionable insights, and produce an elegant HTML dashboard report.
|
||||
|
||||
This is a multi-phase process. Each phase must complete fully before the next begins.
|
||||
Research integrity is paramount — every file must be read, no skipping.
|
||||
|
||||
## Source Selection
|
||||
|
||||
Before starting the analysis, determine which data source is available.
|
||||
|
||||
1. **Plannotator mode (first-class)** — Check `~/.plannotator/plans/`. If it
|
||||
exists and contains `*-denied.md` files, use this mode. The entire workflow
|
||||
below is written for Plannotator data.
|
||||
|
||||
2. **Claude Code fallback mode** — If the Plannotator archive is absent or
|
||||
contains no denied plans, check `~/.claude/projects/`. If present, read
|
||||
[references/claude-code-fallback.md](references/claude-code-fallback.md)
|
||||
before continuing. That reference explains how to use the bundled parser at
|
||||
[scripts/extract_exit_plan_mode_outcomes.py](scripts/extract_exit_plan_mode_outcomes.py)
|
||||
to extract denial reasons from Claude Code JSONL transcripts. Every phase
|
||||
below has a short note explaining what changes in fallback mode — the
|
||||
reference file has the details.
|
||||
|
||||
3. **Neither available** — Ask the user for their Plannotator plans directory or
|
||||
Claude Code projects directory. Do not guess.
|
||||
|
||||
## Phase 0: Locate Plans & Check for Previous Reports
|
||||
|
||||
Use the mode chosen in Source Selection above.
|
||||
|
||||
**Plannotator mode:** Verify the plans directory contains `*-denied.md` files. If
|
||||
none exist, fall back to Claude Code mode before stopping.
|
||||
|
||||
**Claude Code fallback mode:** Run the bundled parser per the fallback reference to
|
||||
build the denial-reason dataset. Create `/tmp/compound-planning/` if needed.
|
||||
|
||||
In either mode, proceed to Previous Report Detection below.
|
||||
|
||||
### Previous Report Detection
|
||||
|
||||
After locating the plans directory, check for existing reports:
|
||||
|
||||
```
|
||||
ls ~/.plannotator/plans/compound-planning-report*.html
|
||||
```
|
||||
|
||||
Reports follow a versioned naming scheme:
|
||||
- First report: `compound-planning-report.html`
|
||||
- Subsequent reports: `compound-planning-report-v2.html`, `compound-planning-report-v3.html`, etc.
|
||||
|
||||
If one or more reports exist, determine the **latest** one (highest version number).
|
||||
Get its filesystem modification date using `stat` (macOS: `stat -f %Sm -t %Y-%m-%d`,
|
||||
Linux: `stat -c %y | cut -d' ' -f1`). This is the **cutoff date**.
|
||||
|
||||
Present the user with a choice:
|
||||
|
||||
> "I found a previous report (`compound-planning-report-v{N}.html`) last updated
|
||||
> on {CUTOFF_DATE}. I can either:
|
||||
>
|
||||
> 1. **Incremental** — Only analyze files dated after {CUTOFF_DATE}, saving tokens
|
||||
> and building on previous findings
|
||||
> 2. **Full** — Re-analyze the entire archive from scratch
|
||||
>
|
||||
> Which would you prefer?"
|
||||
|
||||
Wait for the user's response before proceeding.
|
||||
|
||||
**If incremental:** Filter all subsequent phases to only process files with dates
|
||||
after the cutoff date. The new report version will note in its header narrative that
|
||||
it covers the period from {CUTOFF_DATE} to present, and reference the previous
|
||||
report for earlier findings. The inventory (Phase 1) should still count ALL files
|
||||
for overall stats, but clearly separate "new since last report" counts.
|
||||
|
||||
**If full:** Proceed normally with all files, but still use the next version number
|
||||
for the output filename.
|
||||
|
||||
**If no previous report exists:** Proceed normally. The output filename will be
|
||||
`compound-planning-report.html` (no version suffix for the first report).
|
||||
|
||||
## Phase 1: Inventory
|
||||
|
||||
Count and report the dataset. **Always count ALL files** for overall stats,
|
||||
regardless of whether this is an incremental or full run:
|
||||
|
||||
```
|
||||
- *-approved.md files (count)
|
||||
- *-denied.md files (count)
|
||||
- Date range (earliest to latest date found in filenames)
|
||||
- Total days spanned
|
||||
- Revision rate: denied / (approved + denied) — this is the "X% of plans
|
||||
revised before coding" stat used in dashboard section 1
|
||||
```
|
||||
|
||||
**Note:** Ignore `*.annotations.md` files entirely. Denied files already contain
|
||||
the full plan text plus all reviewer feedback appended after a `---` separator.
|
||||
Annotation files are redundant subsets of this content — reading both would
|
||||
double-count feedback.
|
||||
|
||||
**If incremental mode:** After the total counts, separately report the counts for
|
||||
files dated after the cutoff date only:
|
||||
|
||||
```
|
||||
New since {CUTOFF_DATE}:
|
||||
- *-denied.md files: X (of Y total)
|
||||
- New date range: {CUTOFF_DATE} to {LATEST_DATE}
|
||||
- New days spanned: N
|
||||
```
|
||||
|
||||
If fewer than 3 new denied files exist since the cutoff, warn the user:
|
||||
> "Only {N} new denied plans since the last report. The incremental analysis may
|
||||
> be thin. Would you like to proceed or switch to a full analysis?"
|
||||
|
||||
Also run `wc -l` across all `*-approved.md` files to get average lines per
|
||||
approved plan. This tells the user whether their plans are staying lightweight
|
||||
or bloating over time. You do not need to read approved plan contents — just
|
||||
their line counts. If possible, break this down by time period (e.g., monthly)
|
||||
to show whether plan size changed.
|
||||
|
||||
Dates appear in filenames in YYYY-MM-DD format, sometimes as a prefix
|
||||
(2026-01-07-name-approved.md) and sometimes embedded (name-2026-03-15-approved.md).
|
||||
Extract dates from all filenames.
|
||||
|
||||
Tell the user what you found and that you're beginning the extraction.
|
||||
|
||||
**Claude Code fallback mode:** The Plannotator inventory fields above do not apply.
|
||||
Follow the inventory instructions in
|
||||
[references/claude-code-fallback.md](references/claude-code-fallback.md) instead —
|
||||
report the denial-reason dataset assembled by the parser.
|
||||
|
||||
## Phase 2: Map — Parallel Extraction
|
||||
|
||||
This is the most time-intensive phase. You must read EVERY `*-denied.md` file
|
||||
**in scope**. Do not skip files. Do not summarize early.
|
||||
|
||||
**In scope** means: all denied files if running a full analysis, or only denied
|
||||
files dated after the cutoff date if running incrementally. In incremental mode,
|
||||
only process files whose embedded YYYY-MM-DD date is strictly after the cutoff.
|
||||
|
||||
**Claude Code fallback mode:** The parser output is the clean source dataset. Read
|
||||
the fallback reference for the extraction prompt and batching strategy specific to
|
||||
JSON part files. Do not go back to raw `.jsonl` logs unless the parser fails or the
|
||||
user asks for audit-level verification.
|
||||
|
||||
**Important:** Only read `*-denied.md` files. Do NOT read approved plans,
|
||||
annotation files, or diff files. Each denied file contains the full plan text
|
||||
followed by a `---` separator and the reviewer's feedback — everything needed
|
||||
for analysis is in one file.
|
||||
|
||||
### Batching Strategy
|
||||
|
||||
All extraction agents should use `model: "haiku"` — they're doing straightforward
|
||||
file reading and structured extraction, not reasoning. Haiku is faster and cheaper
|
||||
for this work.
|
||||
|
||||
The approach depends on dataset size:
|
||||
|
||||
**Tiny datasets (≤ 10 total files):** Read all files directly in the main agent —
|
||||
no need for sub-agents. Just read them sequentially and proceed to Phase 3.
|
||||
|
||||
**Small datasets (11-30 files):** Launch 2-3 parallel Haiku agents, splitting
|
||||
files roughly evenly.
|
||||
|
||||
**Medium datasets (31-80 files):** Launch 4-6 parallel Haiku agents (~10-15 files
|
||||
each). Split by file type and/or time period.
|
||||
|
||||
**Large datasets (80+ files):** Launch as many parallel Haiku agents as needed to
|
||||
keep each batch around 10-15 files. Split by the natural time boundaries in the
|
||||
data (months, quarters, or whatever groupings produce balanced batches). If one
|
||||
time period dominates (e.g., the most recent month has 3x the files), split that
|
||||
period into multiple batches.
|
||||
|
||||
Launch all extraction agents in parallel using the Agent tool with
|
||||
`run_in_background: true` and `model: "haiku"`.
|
||||
|
||||
### Output Files
|
||||
|
||||
Each extraction agent must write its results to a clean output file rather than
|
||||
relying on the agent task output (which contains interleaved JSONL framework
|
||||
logs that are difficult to parse). Instruct each agent to write to:
|
||||
|
||||
```
|
||||
/tmp/compound-planning/extraction-{batch-name}.md
|
||||
```
|
||||
|
||||
Create the `/tmp/compound-planning/` directory before launching agents. The
|
||||
reduce agent in Phase 3 will read these clean files directly.
|
||||
|
||||
### Extraction Prompt
|
||||
|
||||
Each agent receives this instruction (adapt the time period, file list, and
|
||||
output path):
|
||||
|
||||
```
|
||||
You are extracting structured data from denied plan files for a pattern analysis.
|
||||
|
||||
Directory: [PLANS DIRECTORY]
|
||||
Files to read: [LIST OF SPECIFIC *-denied.md FILES]
|
||||
Output: Write your complete results to [OUTPUT FILE PATH]
|
||||
|
||||
Each denied file contains two parts separated by a --- line:
|
||||
1. The plan text (above the ---)
|
||||
2. The reviewer's feedback and annotations (below the ---)
|
||||
|
||||
Read EVERY file in your list. For EACH file, extract:
|
||||
- The plan name/topic (from the plan text above the ---)
|
||||
- The denial reason or feedback given (from below the --- — capture the actual
|
||||
words used)
|
||||
- What was specifically asked to change
|
||||
- The type of feedback (let the content determine the category — don't force-fit
|
||||
into predefined types. Common types include things like: scope concerns,
|
||||
approach disagreements, missing information, process requirements, quality
|
||||
concerns, UX/design issues, naming disputes, clarification requests,
|
||||
testing/procedural denials — but the user's actual patterns may differ)
|
||||
- Any specific phrases or recurring language from the reviewer
|
||||
- Individual annotations if present (numbered feedback items with quoted text
|
||||
and reviewer comments)
|
||||
- The date (extracted from the filename)
|
||||
|
||||
Do NOT skip any files. One entry per file.
|
||||
|
||||
Format each entry as:
|
||||
**[filename]**
|
||||
- Date: ...
|
||||
- Topic: ...
|
||||
- Denial reason: ...
|
||||
- Feedback type: ...
|
||||
- Specific asks: ...
|
||||
- Notable phrases: ...
|
||||
- Annotations: [count, with brief summary of each]
|
||||
---
|
||||
|
||||
After processing all files, write the complete results to [OUTPUT FILE PATH].
|
||||
State the total file count at the end of the file.
|
||||
```
|
||||
|
||||
### While Agents Run
|
||||
|
||||
Track completion. As each agent finishes, note the count of files it processed.
|
||||
Verify the total matches the inventory from Phase 1. If any agent's count is
|
||||
short, flag it and consider re-launching for the missing files.
|
||||
|
||||
If an agent times out (possible with large batches — a batch of 128 files can
|
||||
take 8+ minutes), re-launch it for just the unprocessed files. Check the output
|
||||
file to see how far it got before timing out.
|
||||
|
||||
## Phase 3: Reduce — Pattern Analysis
|
||||
|
||||
Once ALL extraction agents have completed (or all files have been read for tiny
|
||||
datasets), proceed with the reduction. Reduction agents should use `model: "sonnet"`
|
||||
— this phase requires real analytical reasoning, not just file reading.
|
||||
|
||||
### Reduction Strategy
|
||||
|
||||
The approach depends on how many extraction files were produced:
|
||||
|
||||
**Standard (≤ 20 extraction files):** Launch a single Sonnet agent to read all
|
||||
extraction files and produce the full analysis. This covers most datasets.
|
||||
|
||||
**Large (21+ extraction files):** Use a two-stage reduce:
|
||||
|
||||
1. **Stage 1 — Partial reduces:** Split the extraction files into groups of 4-6.
|
||||
Launch parallel Sonnet agents, each reading one group and producing a partial
|
||||
analysis with the same sections listed below. Each writes to
|
||||
`/tmp/compound-planning/partial-reduce-{N}.md`.
|
||||
|
||||
2. **Stage 2 — Final reduce:** A single Sonnet agent reads all partial reduce
|
||||
files and synthesizes them into the final comprehensive analysis. This agent
|
||||
merges taxonomies, combines counts, deduplicates patterns, and reconciles any
|
||||
conflicting categorizations across partials.
|
||||
|
||||
**Claude Code fallback mode:** The reduction phase is the same. The only upstream
|
||||
difference is that extraction files were derived from normalized denial-reason JSON
|
||||
instead of Plannotator markdown files.
|
||||
|
||||
### Reduction Prompt
|
||||
|
||||
Give each reduction agent this prompt (adapt file paths for single vs multi-stage):
|
||||
|
||||
```
|
||||
You are a data scientist conducting the reduction phase of a map-reduce analysis
|
||||
across a user's denied plan archive.
|
||||
|
||||
Read ALL extraction files at [FILE PATHS]
|
||||
|
||||
These files contain structured extractions from every denied plan file. Each
|
||||
extraction includes the plan topic, denial feedback, annotations, and reviewer
|
||||
language. Your job: aggregate everything, find patterns, cluster into a taxonomy,
|
||||
and produce a comprehensive analysis.
|
||||
|
||||
Be exhaustive. Use real counts. Quote real phrases from the data. This is
|
||||
research — no hand-waving, no fabrication.
|
||||
|
||||
Write your complete results to [OUTPUT FILE PATH].
|
||||
|
||||
Produce the following sections:
|
||||
[... sections listed below ...]
|
||||
```
|
||||
|
||||
The reduction agent's job is to let the data speak. Do not impose a predetermined
|
||||
framework — discover what's actually there. The analysis must produce:
|
||||
|
||||
### 1. Denial Reason Taxonomy
|
||||
Categorize every denial into a finite set of types that emerge from the data. Count
|
||||
occurrences. Show percentages. Include real example quotes for each type. Aim for
|
||||
8-15 categories — enough to be specific, few enough to be scannable. Let the user's
|
||||
actual feedback determine what the categories are.
|
||||
|
||||
### 2. Top Feedback Patterns (ranked by frequency)
|
||||
The 5-10 most recurring patterns. For each: what the reviewer consistently asks for,
|
||||
3+ example quotes from different files, and whether the pattern changed over time.
|
||||
|
||||
### 3. Recurring Phrases
|
||||
Exact phrases the reviewer uses repeatedly, with counts and what they signal. These
|
||||
are the reviewer's vocabulary — their shorthand for what they care about.
|
||||
|
||||
### 4. What the Reviewer Values (implicit preferences)
|
||||
Derived from patterns — what does this specific person care about most? Quality?
|
||||
Speed? Narrative? Architecture? Process? Simplicity? Rank by evidence strength.
|
||||
This section should feel like a personality profile of the reviewer's standards.
|
||||
|
||||
### 5. What Agents Consistently Get Wrong
|
||||
The flip side — what recurring mistakes trigger denials? What should agents stop
|
||||
doing for this reviewer?
|
||||
|
||||
### 6. Structural Requests
|
||||
What plan structure does the reviewer consistently demand? Required sections,
|
||||
ordering, format preferences, level of detail expected.
|
||||
|
||||
### 7. Evolution Over Time
|
||||
How feedback patterns changed across the time span. Group by whatever natural time
|
||||
boundaries exist in the data (weeks for short spans, months for longer ones). Did
|
||||
expectations mature? Did new patterns emerge? What shifted? If the dataset spans
|
||||
less than a month, note that evolution analysis is limited but still look for any
|
||||
progression from early to late files.
|
||||
|
||||
### 8. Actionable Prompt Instructions
|
||||
The most important output. Based on all patterns: specific numbered instructions
|
||||
that could be embedded in a planning prompt to prevent the most common denial
|
||||
reasons. Write these as actual directives an agent could follow. Be specific to
|
||||
this user's patterns — generic advice like "write good plans" is worthless. Each
|
||||
instruction should trace back to a real, frequent denial pattern.
|
||||
|
||||
After writing the instructions, calculate what percentage of denials they would
|
||||
address (count how many denials fall into categories covered by the instructions
|
||||
vs total denials). Report this percentage — it will be different for every user.
|
||||
|
||||
## Phase 4: Generate the HTML Dashboard
|
||||
|
||||
Build a single, self-contained HTML file as the final deliverable. Save it to
|
||||
the user's plans directory with a versioned filename:
|
||||
|
||||
- First ever report: `compound-planning-report.html`
|
||||
- Second report: `compound-planning-report-v2.html`
|
||||
- Third report: `compound-planning-report-v3.html`
|
||||
- And so on.
|
||||
|
||||
The version number was determined in Phase 0 based on existing reports found.
|
||||
|
||||
**If this is an incremental report**, the header should indicate the analysis
|
||||
period (e.g., "March 15 – March 31, 2026") and include a subtitle noting
|
||||
"Incremental analysis — see v{N-1} for earlier findings." The narrative in
|
||||
section 1 should frame findings as what's new or changed since the last report,
|
||||
not as a complete picture. Overall stats in the header (file counts, revision
|
||||
rate) should still reflect the full archive for context.
|
||||
|
||||
Read the template at `assets/report-template.html` for the **design language
|
||||
only**. The template contains example data from a previous analysis — ignore all
|
||||
data values, quotes, and percentages in the template. Use only its visual design:
|
||||
colors, typography, spacing, component styles, and layout patterns.
|
||||
|
||||
### Design Language (from template)
|
||||
|
||||
- **Palette:** Light mode, warm off-white (#FDFCFB), text in slate scale, amber
|
||||
for highlights/accents, emerald for positive, rose for negative, indigo for
|
||||
action elements
|
||||
- **Typography:** Playfair Display (serif, for narrative headings), Inter (sans,
|
||||
for body/data), JetBrains Mono (mono, for code/phrases) — Google Fonts CDN
|
||||
- **Layout:** Single-column, max-width 1024px, generous vertical whitespace (128px
|
||||
between major sections), editorial/narrative-first aesthetic
|
||||
- **Tone:** Calm, reflective, authoritative. Like a personal retrospective journal,
|
||||
not a monitoring dashboard.
|
||||
|
||||
### Page Frame (header + footer)
|
||||
|
||||
Before the 7 sections, the page has:
|
||||
|
||||
- **Header:** Report title on the left (Playfair Display, ~36px), project name +
|
||||
date range below it in light meta text. On the right: file counts in mono
|
||||
(e.g., "223 denials · 71 days"). Separated from content by
|
||||
a bottom border. Generous bottom padding before section 1.
|
||||
|
||||
- **Footer:** After section 7. Top border, centered italic Playfair Display tagline
|
||||
summarizing the corpus (e.g., "Analysis of X denied plans from the Plannotator
|
||||
archive.").
|
||||
|
||||
### Dashboard Section Order (7 sections)
|
||||
|
||||
The report follows this exact section order. Each section builds on the previous
|
||||
one — the flow moves from "what happened" through "why" to "what to do about it":
|
||||
|
||||
1. **The story in the data** — An editorial narrative paragraph (Playfair Display
|
||||
serif, ~26px) that tells the headline finding in prose. Not bullet points — a
|
||||
real paragraph that reads like the opening of an article. Alongside it, a KPI
|
||||
sidebar with 3 key metrics (the top denial percentage, the overall revision
|
||||
rate, and the number of distinct denial categories found). Use an amber inline
|
||||
highlight on the most striking number in the narrative.
|
||||
|
||||
2. **Why plans get denied** — The taxonomy as a ranked list. Each row: rank number
|
||||
(mono), category label, a thin 4px progress bar (top item in amber-500, rest
|
||||
in slate-300), percentage (mono), and for the top entries, a real italic quote
|
||||
from the data below the label. Show the top 10 categories or however many the
|
||||
data supports (minimum 5).
|
||||
|
||||
3. **How expectations evolved** — One card per natural time period. Each card has:
|
||||
the period name in serif, a theme phrase in colored uppercase (different color
|
||||
per period to show progression), a description paragraph, and a stat line at
|
||||
the bottom (e.g., "X denials · Y narrative requests"). If the data spans less
|
||||
than 3 distinct periods, use 2 cards or even a single card with internal
|
||||
progression noted.
|
||||
|
||||
4. **What works vs what doesn't** — Two side-by-side cards. Left: green-tinted
|
||||
(emerald-50/50 bg, emerald-100 border) with traits of plans that succeed for
|
||||
this reviewer. Right: red-tinted (rose-50/50 bg, rose-100 border) with what
|
||||
agents keep getting wrong. Both derived from the reduction analysis. Bulleted
|
||||
with small colored dots. 5-8 items per card.
|
||||
|
||||
5. **The actionable output** — The diagnostic payoff. Opens with a Playfair
|
||||
Display narrative sentence stating how many prompt instructions were derived
|
||||
and what estimated percentage of denials they address (use the real calculated
|
||||
percentage from Phase 3, not a generic number). Then the top 3 most impactful
|
||||
improvements as numbered items, each with an amber number, bold title, and
|
||||
one-line description. This section bridges the analysis and the full prompt
|
||||
that follows.
|
||||
|
||||
6. **Your most-used phrases** — Grid of chips (2-col mobile, 3-col desktop). Each
|
||||
chip: monospace quoted phrase on the left, frequency count on the right. White
|
||||
bg, slate-200 border, rounded-12px. Show 9-12 of the most recurring phrases
|
||||
found. These should be the reviewer's actual words — their verbal fingerprint.
|
||||
|
||||
7. **The corrective prompt** — Dark panel (slate-900 bg, white text, rounded-3xl,
|
||||
shadow-xl). Opens with a Playfair intro sentence about the instructions. Then
|
||||
a dark code block (slate-800/80 bg, amber-200 monospace text) containing the
|
||||
full numbered prompt instructions from Phase 3. Include a copy-to-clipboard
|
||||
button that works (JS included). Below the code block: a gradient glow card
|
||||
(indigo-to-purple blurred halo behind a white card) with a closing message
|
||||
that these instructions are personal — derived from the user's own feedback,
|
||||
their own language, their own standards.
|
||||
|
||||
### Adaptation Rules
|
||||
|
||||
- If the user has < 3 months of data, reduce the evolution section to fewer cards
|
||||
- If most denied files lack feedback below the `---` (bare denials with no
|
||||
annotations), note this in the narrative — the analysis will be thinner
|
||||
- **Claude Code fallback mode:** Explicitly label the report source as Claude Code
|
||||
`ExitPlanMode` denial reasons. Do not fabricate Plannotator-only fields such as
|
||||
annotation counts or approved-plan line counts. See the fallback reference for
|
||||
KPI substitutes and footer/provenance guidance.
|
||||
- If fewer than 5 denial categories emerge, combine the taxonomy and patterns
|
||||
sections into one
|
||||
- If the dataset is very small (< 20 files), the narrative should acknowledge the
|
||||
limited sample size and frame findings as preliminary
|
||||
- The number of prompt instructions will vary per user — could be 8 or 20. Don't
|
||||
force exactly 17. Let the data determine the count.
|
||||
- The top 3 actionable items in section 5 must be the 3 that cover the largest
|
||||
share of denials, not the 3 that sound most impressive
|
||||
|
||||
### Key Rules
|
||||
|
||||
1. Every number must come from the real analysis — no fabricated data
|
||||
2. Every quote must be a real quote from a real file
|
||||
3. The taxonomy percentages must be calculated from real counts
|
||||
4. The prompt instructions must trace back to actual denial patterns
|
||||
5. The copy button on the prompt block must work (include the JS)
|
||||
|
||||
After generating, open the file in the user's browser.
|
||||
|
||||
## Phase 5: Summary
|
||||
|
||||
Tell the user:
|
||||
- How many denied files were analyzed
|
||||
- If incremental: how many were new since the last report
|
||||
- The top 3 denial patterns found
|
||||
- The estimated percentage of denials the prompt instructions would address
|
||||
- The single most impactful prompt improvement
|
||||
- Where the report was saved (including version number)
|
||||
- If incremental: remind the user that earlier findings are in the previous report
|
||||
|
||||
**Claude Code fallback mode:** Adapt the summary per the fallback reference —
|
||||
report human denial reasons analyzed and total `ExitPlanMode` attempts scanned
|
||||
instead of Plannotator file counts.
|
||||
|
||||
## Phase 6: Improvement Hook
|
||||
|
||||
After presenting the summary, ask the user if they want to enable an **improvement
|
||||
hook** — this takes the corrective prompt instructions from section 7 of the report
|
||||
and writes them to a file that Plannotator's `EnterPlanMode` hook can inject into
|
||||
every future planning session automatically.
|
||||
|
||||
> "Would you like to enable the improvement hook? This will save the corrective
|
||||
> prompt instructions to a file that gets automatically injected into all future
|
||||
> planning sessions — so Claude sees your feedback patterns before writing any plan."
|
||||
|
||||
**If yes:**
|
||||
|
||||
The hook file lives at:
|
||||
|
||||
```
|
||||
~/.plannotator/hooks/compound/enterplanmode-improve-hook.txt
|
||||
```
|
||||
|
||||
Create the `~/.plannotator/hooks/compound/` directory if it doesn't exist.
|
||||
|
||||
The file contents should be the corrective prompt instructions from Phase 3 —
|
||||
the same numbered list that appears in section 7 of the HTML report. Write them
|
||||
as plain text, one instruction per line, prefixed with their number. No HTML, no
|
||||
markdown fences, no preamble — just the instructions themselves. The hook system
|
||||
will inject this file's contents as-is into the planning context.
|
||||
|
||||
**If the file already exists:**
|
||||
|
||||
Read the existing file and present the user with a choice:
|
||||
|
||||
> "An improvement hook already exists from a previous analysis. I can:
|
||||
>
|
||||
> 1. **Replace** — Overwrite with the new instructions (the old ones are gone)
|
||||
> 2. **Merge** — Combine both, deduplicating overlapping instructions and
|
||||
> keeping the best version of each
|
||||
> 3. **Keep existing** — Leave the current hook as-is, skip this step
|
||||
>
|
||||
> Which would you prefer?"
|
||||
|
||||
- **Replace:** Overwrite the file with the new instructions.
|
||||
- **Merge:** Read the existing instructions, compare with the new ones, and
|
||||
produce a merged set. Remove duplicates (same intent even if worded differently).
|
||||
When two instructions cover the same pattern, keep the more specific or
|
||||
actionable version. Re-number the final list sequentially. Write the merged
|
||||
result to the file. Show the user what changed (added N new, removed N
|
||||
redundant, kept N existing).
|
||||
- **Keep existing:** Do nothing, move on.
|
||||
|
||||
**If no:** Skip this phase entirely.
|
||||
|
||||
## Important Notes
|
||||
|
||||
- **Data source priority:** Plannotator is the first-class path. Claude Code log
|
||||
analysis is the secondary path for users without Plannotator archives.
|
||||
- **Research integrity:** Every file must be read. The value of this analysis comes
|
||||
from completeness. Sampling or skipping undermines the findings.
|
||||
- **Real data only:** Never fabricate quotes, percentages, or patterns. If the data
|
||||
doesn't show a clear pattern, say so honestly rather than inventing one.
|
||||
- **Let the data lead:** The taxonomy, patterns, and instructions should emerge from
|
||||
what's actually in the files. Different users will have completely different
|
||||
denial patterns. A user building mobile apps will have different feedback than
|
||||
one building APIs. Don't assume what the patterns will be.
|
||||
- **Agent parallelization:** For large datasets, maximize parallel agents to reduce
|
||||
wall-clock time. The bottleneck is the largest batch — split it.
|
||||
- **Structured extraction format:** Ask extraction agents to return structured text
|
||||
with consistent delimiters so the reduce agent can parse reliably.
|
||||
- **The report is the artifact:** The HTML dashboard is what the user keeps. It
|
||||
should be beautiful, honest, and useful. Every section should feel like it was
|
||||
written about them specifically, because it was.
|
||||
@@ -0,0 +1,795 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Compound Planning — What 370 Files Reveal</title>
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link href="https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@0,400;0,500;1,400&family=Inter:wght@300;400;500;600&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
|
||||
<style>
|
||||
*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
|
||||
|
||||
:root {
|
||||
--bg: #FDFCFB;
|
||||
--slate-900: #0f172a;
|
||||
--slate-800: #1e293b;
|
||||
--slate-700: #334155;
|
||||
--slate-600: #475569;
|
||||
--slate-500: #64748b;
|
||||
--slate-400: #94a3b8;
|
||||
--slate-300: #cbd5e1;
|
||||
--slate-200: #e2e8f0;
|
||||
--slate-100: #f1f5f9;
|
||||
--slate-50: #f8fafc;
|
||||
--amber-500: #f59e0b;
|
||||
--amber-600: #d97706;
|
||||
--amber-700: #b45309;
|
||||
--amber-50: #fffbeb;
|
||||
--emerald-500: #10b981;
|
||||
--emerald-600: #059669;
|
||||
--emerald-400: #34d399;
|
||||
--emerald-900: #064e3b;
|
||||
--emerald-800: #065f46;
|
||||
--emerald-100: #d1fae5;
|
||||
--emerald-50: #ecfdf5;
|
||||
--rose-500: #f43f5e;
|
||||
--rose-600: #e11d48;
|
||||
--rose-400: #fb7185;
|
||||
--rose-900: #881337;
|
||||
--rose-800: #9f1239;
|
||||
--rose-100: #ffe4e6;
|
||||
--rose-50: #fff1f2;
|
||||
--indigo-500: #6366f1;
|
||||
--indigo-600: #4f46e5;
|
||||
--purple-600: #9333ea;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: 'Inter', ui-sans-serif, system-ui, sans-serif;
|
||||
background: var(--bg);
|
||||
color: var(--slate-800);
|
||||
-webkit-font-smoothing: antialiased;
|
||||
}
|
||||
|
||||
.container {
|
||||
max-width: 1024px;
|
||||
margin: 0 auto;
|
||||
padding: 48px 24px 64px;
|
||||
}
|
||||
@media (min-width: 768px) { .container { padding: 96px 24px 80px; } }
|
||||
|
||||
/* Typography */
|
||||
.font-serif { font-family: 'Playfair Display', ui-serif, Georgia, serif; }
|
||||
.font-mono { font-family: 'JetBrains Mono', ui-monospace, monospace; }
|
||||
|
||||
/* Header */
|
||||
header {
|
||||
border-bottom: 1px solid var(--slate-200);
|
||||
padding-bottom: 40px;
|
||||
margin-bottom: 96px;
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: flex-end;
|
||||
flex-wrap: wrap;
|
||||
gap: 16px;
|
||||
}
|
||||
header h1 {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-size: 36px;
|
||||
font-weight: 400;
|
||||
color: var(--slate-900);
|
||||
line-height: 1.2;
|
||||
}
|
||||
header .meta {
|
||||
font-size: 15px;
|
||||
font-weight: 300;
|
||||
color: var(--slate-500);
|
||||
letter-spacing: 0.04em;
|
||||
}
|
||||
|
||||
/* Sections */
|
||||
.section { margin-bottom: 128px; }
|
||||
.section-label {
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.2em;
|
||||
color: var(--slate-400);
|
||||
margin-bottom: 24px;
|
||||
}
|
||||
|
||||
/* Narrative + KPIs */
|
||||
.summary {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr;
|
||||
gap: 48px;
|
||||
align-items: start;
|
||||
}
|
||||
@media (min-width: 768px) {
|
||||
.summary { grid-template-columns: 1fr 240px; }
|
||||
}
|
||||
.narrative {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-size: 26px;
|
||||
line-height: 1.45;
|
||||
color: var(--slate-900);
|
||||
}
|
||||
.narrative .highlight {
|
||||
background: var(--amber-50);
|
||||
color: var(--amber-700);
|
||||
padding: 1px 6px;
|
||||
border-radius: 3px;
|
||||
}
|
||||
.kpi-stack {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 32px;
|
||||
}
|
||||
@media (min-width: 768px) {
|
||||
.kpi-stack { border-left: 1px solid var(--slate-200); padding-left: 32px; }
|
||||
}
|
||||
.kpi-item .kpi-value {
|
||||
font-size: 36px;
|
||||
font-weight: 300;
|
||||
color: var(--slate-900);
|
||||
letter-spacing: -0.02em;
|
||||
}
|
||||
.kpi-item .kpi-label {
|
||||
font-size: 10px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.15em;
|
||||
color: var(--slate-500);
|
||||
margin-top: 2px;
|
||||
}
|
||||
|
||||
/* Taxonomy bars */
|
||||
.taxonomy-list { display: flex; flex-direction: column; gap: 20px; }
|
||||
.tax-row { display: grid; grid-template-columns: 24px 1fr 52px; gap: 12px; align-items: center; }
|
||||
.tax-rank {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
color: var(--slate-400);
|
||||
text-align: right;
|
||||
}
|
||||
.tax-body { display: flex; flex-direction: column; gap: 6px; }
|
||||
.tax-label { font-size: 14px; font-weight: 500; color: var(--slate-800); }
|
||||
.tax-bar-track { height: 4px; background: var(--slate-100); border-radius: 100px; overflow: hidden; }
|
||||
.tax-bar-fill { height: 100%; border-radius: 100px; transition: width 0.6s ease; }
|
||||
.tax-bar-fill.top { background: var(--amber-500); }
|
||||
.tax-bar-fill.rest { background: var(--slate-300); }
|
||||
.tax-pct {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
color: var(--slate-500);
|
||||
text-align: right;
|
||||
}
|
||||
.tax-quote {
|
||||
font-size: 12px;
|
||||
font-style: italic;
|
||||
color: var(--slate-500);
|
||||
margin-top: 2px;
|
||||
}
|
||||
|
||||
/* Evolution timeline */
|
||||
.evolution-grid {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr;
|
||||
gap: 24px;
|
||||
}
|
||||
@media (min-width: 768px) { .evolution-grid { grid-template-columns: repeat(3, 1fr); } }
|
||||
.evo-card {
|
||||
background: white;
|
||||
border: 1px solid var(--slate-200);
|
||||
border-radius: 16px;
|
||||
padding: 28px;
|
||||
}
|
||||
.evo-card .evo-month {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-size: 20px;
|
||||
color: var(--slate-900);
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
.evo-card .evo-theme {
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.12em;
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
.evo-card .evo-desc {
|
||||
font-size: 14px;
|
||||
color: var(--slate-600);
|
||||
line-height: 1.6;
|
||||
}
|
||||
.evo-card .evo-stat {
|
||||
margin-top: 16px;
|
||||
padding-top: 16px;
|
||||
border-top: 1px solid var(--slate-100);
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
color: var(--slate-500);
|
||||
}
|
||||
.evo-jan .evo-theme { color: var(--slate-600); }
|
||||
.evo-feb .evo-theme { color: var(--amber-600); }
|
||||
.evo-mar .evo-theme { color: var(--indigo-600); }
|
||||
|
||||
/* Quality comparison */
|
||||
.quality-grid {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr;
|
||||
gap: 24px;
|
||||
}
|
||||
@media (min-width: 768px) { .quality-grid { grid-template-columns: 1fr 1fr; } }
|
||||
.q-card {
|
||||
border-radius: 24px;
|
||||
padding: 36px;
|
||||
}
|
||||
.q-card.good {
|
||||
background: color-mix(in srgb, var(--emerald-50) 50%, transparent);
|
||||
border: 1px solid var(--emerald-100);
|
||||
}
|
||||
.q-card.bad {
|
||||
background: color-mix(in srgb, var(--rose-50) 50%, transparent);
|
||||
border: 1px solid var(--rose-100);
|
||||
}
|
||||
.q-card .q-icon { font-size: 20px; margin-bottom: 12px; }
|
||||
.q-card .q-title {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-size: 22px;
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
.q-card.good .q-title { color: var(--emerald-900); }
|
||||
.q-card.bad .q-title { color: var(--rose-900); }
|
||||
.q-list { list-style: none; display: flex; flex-direction: column; gap: 14px; }
|
||||
.q-list li {
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
gap: 10px;
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
}
|
||||
.q-card.good .q-list li { color: color-mix(in srgb, var(--emerald-800) 90%, transparent); }
|
||||
.q-card.bad .q-list li { color: color-mix(in srgb, var(--rose-800) 90%, transparent); }
|
||||
.q-dot {
|
||||
width: 6px;
|
||||
height: 6px;
|
||||
border-radius: 50%;
|
||||
flex-shrink: 0;
|
||||
margin-top: 7px;
|
||||
}
|
||||
.q-card.good .q-dot { background: var(--emerald-400); }
|
||||
.q-card.bad .q-dot { background: var(--rose-400); }
|
||||
|
||||
/* Phrases */
|
||||
.phrases-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
gap: 12px;
|
||||
}
|
||||
@media (min-width: 768px) { .phrases-grid { grid-template-columns: repeat(3, 1fr); } }
|
||||
.phrase-chip {
|
||||
background: white;
|
||||
border: 1px solid var(--slate-200);
|
||||
border-radius: 12px;
|
||||
padding: 14px 16px;
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
}
|
||||
.phrase-text {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
color: var(--slate-700);
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
.phrase-count {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 11px;
|
||||
color: var(--slate-400);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
/* Dark action panel */
|
||||
.action-panel {
|
||||
background: var(--slate-900);
|
||||
color: white;
|
||||
border-radius: 24px;
|
||||
padding: 40px;
|
||||
box-shadow: 0 20px 25px -5px rgb(0 0 0 / 0.1), 0 8px 10px -6px rgb(0 0 0 / 0.1);
|
||||
}
|
||||
@media (min-width: 768px) { .action-panel { padding: 56px; } }
|
||||
.action-panel .section-label { color: var(--slate-500); }
|
||||
.action-panel .ap-intro {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-size: 22px;
|
||||
color: white;
|
||||
line-height: 1.4;
|
||||
margin-bottom: 32px;
|
||||
max-width: 640px;
|
||||
}
|
||||
.prompt-block {
|
||||
background: color-mix(in srgb, var(--slate-800) 80%, transparent);
|
||||
border: 1px solid color-mix(in srgb, var(--slate-700) 50%, transparent);
|
||||
border-radius: 16px;
|
||||
overflow: hidden;
|
||||
}
|
||||
.prompt-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
padding: 12px 20px;
|
||||
border-bottom: 1px solid color-mix(in srgb, var(--slate-700) 30%, transparent);
|
||||
}
|
||||
.prompt-header-label {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
color: var(--slate-400);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
}
|
||||
.prompt-header-label svg { width: 14px; height: 14px; }
|
||||
.copy-btn {
|
||||
background: none;
|
||||
border: none;
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
color: var(--slate-400);
|
||||
cursor: pointer;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
transition: color 0.2s;
|
||||
}
|
||||
.copy-btn:hover { color: white; }
|
||||
.copy-btn.copied { color: var(--emerald-400); }
|
||||
.prompt-body {
|
||||
padding: 20px;
|
||||
max-height: 480px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
.prompt-body pre {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 13px;
|
||||
line-height: 1.7;
|
||||
color: color-mix(in srgb, var(--amber-200) 90%, transparent);
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
.prompt-body pre .comment {
|
||||
color: var(--slate-500);
|
||||
}
|
||||
|
||||
/* Glow card */
|
||||
.glow-wrap {
|
||||
position: relative;
|
||||
margin-top: 48px;
|
||||
}
|
||||
.glow-bg {
|
||||
position: absolute;
|
||||
inset: -2px;
|
||||
background: linear-gradient(135deg, var(--indigo-500), var(--purple-600));
|
||||
border-radius: 26px;
|
||||
opacity: 0.15;
|
||||
filter: blur(16px);
|
||||
transition: opacity 0.5s;
|
||||
}
|
||||
.glow-wrap:hover .glow-bg { opacity: 0.25; }
|
||||
.glow-card {
|
||||
position: relative;
|
||||
background: white;
|
||||
border: 1px solid var(--slate-200);
|
||||
border-radius: 24px;
|
||||
padding: 32px 36px;
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
gap: 20px;
|
||||
}
|
||||
.glow-card .gc-text {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-size: 18px;
|
||||
font-weight: 500;
|
||||
color: var(--slate-900);
|
||||
line-height: 1.5;
|
||||
max-width: 640px;
|
||||
}
|
||||
.glow-card .gc-text em {
|
||||
font-style: italic;
|
||||
color: var(--indigo-600);
|
||||
}
|
||||
|
||||
/* Footer */
|
||||
footer {
|
||||
border-top: 1px solid var(--slate-200);
|
||||
padding-top: 48px;
|
||||
margin-top: 0;
|
||||
text-align: center;
|
||||
}
|
||||
footer p {
|
||||
font-family: 'Playfair Display', serif;
|
||||
font-style: italic;
|
||||
font-size: 15px;
|
||||
color: var(--slate-400);
|
||||
}
|
||||
|
||||
/* Scrollbar in dark code block */
|
||||
.prompt-body::-webkit-scrollbar { width: 6px; }
|
||||
.prompt-body::-webkit-scrollbar-track { background: transparent; }
|
||||
.prompt-body::-webkit-scrollbar-thumb { background: var(--slate-700); border-radius: 3px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
|
||||
<header>
|
||||
<div>
|
||||
<h1>What 370 Files Reveal About<br>How You Plan</h1>
|
||||
<div class="meta" style="margin-top: 8px;">backnotprop/plannotator · Jan 7 – Mar 18, 2026</div>
|
||||
</div>
|
||||
<div class="meta" style="text-align: right;">
|
||||
<span class="font-mono" style="font-size: 12px;">202 denials · 168 annotations · 71 days</span>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<!-- 1. Narrative + KPIs -->
|
||||
<div class="section">
|
||||
<div class="section-label">1. The story in the data</div>
|
||||
<div class="summary">
|
||||
<div class="narrative">
|
||||
Across 71 days you denied or revised <span class="highlight">202 plans</span> before any code was written. The single most common reason—appearing in 1 out of 4 denials—was the same: the agent jumped to implementation without telling you <em>what</em> it was building, <em>why</em>, or <em>how</em>. Missing narrative. Missing context. Missing the story. Your expectations evolved from “does it work?” in January to “tell me the story and be confident” by March.
|
||||
</div>
|
||||
<div class="kpi-stack">
|
||||
<div class="kpi-item">
|
||||
<div class="kpi-value">25.7%</div>
|
||||
<div class="kpi-label">Denials for missing narrative</div>
|
||||
</div>
|
||||
<div class="kpi-item">
|
||||
<div class="kpi-value">50%</div>
|
||||
<div class="kpi-label">Plans revised before coding</div>
|
||||
</div>
|
||||
<div class="kpi-item">
|
||||
<div class="kpi-value">12</div>
|
||||
<div class="kpi-label">Distinct denial categories</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 2. Denial Taxonomy -->
|
||||
<div class="section">
|
||||
<div class="section-label">2. Why plans get denied</div>
|
||||
<div class="taxonomy-list">
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">1</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Missing Narrative / Overview</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill top" style="width: 100%"></div></div>
|
||||
<span class="tax-quote">"This plan is denied without narrative detail and rationales."</span>
|
||||
</div>
|
||||
<span class="tax-pct">25.7%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">2</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Clarification Needed</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 65%"></div></div>
|
||||
<span class="tax-quote">"What does this Mean???"</span>
|
||||
</div>
|
||||
<span class="tax-pct">16.8%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">3</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Testing / Procedural</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 54%"></div></div>
|
||||
<span class="tax-quote">"I'm denying so you can create a diff."</span>
|
||||
</div>
|
||||
<span class="tax-pct">13.9%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">4</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Wrong Approach / Over-Engineered</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 37%"></div></div>
|
||||
<span class="tax-quote">"Why are we doing difficult shit here? I want a hover experience."</span>
|
||||
</div>
|
||||
<span class="tax-pct">9.4%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">5</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Process Requirement</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 31%"></div></div>
|
||||
<span class="tax-quote">"Make sure you feature branch."</span>
|
||||
</div>
|
||||
<span class="tax-pct">7.9%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">6</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Confidence / Risk Check</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 29%"></div></div>
|
||||
<span class="tax-quote">"Take a step back, breathe, make sure we're not being irrational."</span>
|
||||
</div>
|
||||
<span class="tax-pct">7.4%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">7</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Content Removal</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 27%"></div></div>
|
||||
<span class="tax-quote">"I don't want this in the plan."</span>
|
||||
</div>
|
||||
<span class="tax-pct">6.9%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">8</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Implementation Bug Found</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 23%"></div></div>
|
||||
</div>
|
||||
<span class="tax-pct">5.9%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">9</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Design / UX Issue</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 21%"></div></div>
|
||||
</div>
|
||||
<span class="tax-pct">5.4%</span>
|
||||
</div>
|
||||
<div class="tax-row">
|
||||
<span class="tax-rank">10</span>
|
||||
<div class="tax-body">
|
||||
<span class="tax-label">Naming / Terminology</span>
|
||||
<div class="tax-bar-track"><div class="tax-bar-fill rest" style="width: 16%"></div></div>
|
||||
<span class="tax-quote">"Why do you keep calling it Simplified????"</span>
|
||||
</div>
|
||||
<span class="tax-pct">4.0%</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 3. Evolution -->
|
||||
<div class="section">
|
||||
<div class="section-label">3. How your expectations evolved</div>
|
||||
<div class="evolution-grid">
|
||||
<div class="evo-card evo-jan">
|
||||
<div class="evo-month">January</div>
|
||||
<div class="evo-theme">"Does it work?"</div>
|
||||
<div class="evo-desc">Bug-hunting phase. You were hands-on testing View Logs, iterating on session scoping heuristics. 60% of denials were implementation bugs and verification failures. No mention of “narrative” or “overview” yet.</div>
|
||||
<div class="evo-stat">26 denials · 0 narrative requests</div>
|
||||
</div>
|
||||
<div class="evo-card evo-feb">
|
||||
<div class="evo-month">February</div>
|
||||
<div class="evo-theme">"Follow the process"</div>
|
||||
<div class="evo-desc">Process gates emerged: feature branches, Linear tickets, pull main. 40% of denials were procedural (diff testing). UX polish intensified. The first narrative demands appeared: “I want a narrative under each section.”</div>
|
||||
<div class="evo-stat">48 denials · 6 narrative requests</div>
|
||||
</div>
|
||||
<div class="evo-card evo-mar">
|
||||
<div class="evo-month">March</div>
|
||||
<div class="evo-theme">"Tell me the story"</div>
|
||||
<div class="evo-desc">Narrative became the #1 gate. You created a “Missing overview” label and applied it systematically. Confidence checks became standard. You began telling agents to “take a step back, breathe, and analyze.”</div>
|
||||
<div class="evo-stat">128 denials · 25+ narrative requests</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 4. Quality comparison -->
|
||||
<div class="section">
|
||||
<div class="section-label">4. What works vs. what doesn't</div>
|
||||
<div class="quality-grid">
|
||||
<div class="q-card good">
|
||||
<div class="q-icon">✓</div>
|
||||
<div class="q-title">What approved plans do</div>
|
||||
<ul class="q-list">
|
||||
<li><span class="q-dot"></span>Lead with a narrative overview: what exists, what changes, why</li>
|
||||
<li><span class="q-dot"></span>State confidence and identify risks proactively</li>
|
||||
<li><span class="q-dot"></span>Reference existing codebase patterns before proposing new code</li>
|
||||
<li><span class="q-dot"></span>Use explicit, transparent naming (not euphemisms)</li>
|
||||
<li><span class="q-dot"></span>Break large work into phases with evaluation gates</li>
|
||||
<li><span class="q-dot"></span>Include example output for user-facing changes</li>
|
||||
<li><span class="q-dot"></span>Specify feature branch and ticket creation steps</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="q-card bad">
|
||||
<div class="q-icon">✗</div>
|
||||
<div class="q-title">What agents keep getting wrong</div>
|
||||
<ul class="q-list">
|
||||
<li><span class="q-dot"></span>Jump to implementation steps without narrative context</li>
|
||||
<li><span class="q-dot"></span>Over-engineer: Shift+Click when hover works, MCP tool when a README suffices</li>
|
||||
<li><span class="q-dot"></span>Introduce new code for things the codebase already solves</li>
|
||||
<li><span class="q-dot"></span>Propose work on top of failing lint/type checks</li>
|
||||
<li><span class="q-dot"></span>Use vague or euphemistic naming (“Accept” instead of “Git Add”)</li>
|
||||
<li><span class="q-dot"></span>Wait to be asked for confidence instead of stating it</li>
|
||||
<li><span class="q-dot"></span>Rush to modify instead of reporting what they see</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 5. The actionable output -->
|
||||
<div class="section">
|
||||
<div class="section-label">5. The actionable output</div>
|
||||
<div class="narrative" style="margin-bottom: 32px;">
|
||||
The analysis produced <span class="highlight">17 specific prompt instructions</span> that, if embedded in a planning prompt, would address ~70% of all denial reasons. The biggest three:
|
||||
</div>
|
||||
<div style="display: flex; flex-direction: column; gap: 20px;">
|
||||
<div style="display: flex; gap: 16px; align-items: flex-start;">
|
||||
<span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">1</span>
|
||||
<div>
|
||||
<div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">Every plan MUST start with a Solution Overview</div>
|
||||
<div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">What exists, what changes, why, how. This alone addresses 1 in 4 denials.</div>
|
||||
</div>
|
||||
</div>
|
||||
<div style="display: flex; gap: 16px; align-items: flex-start;">
|
||||
<span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">2</span>
|
||||
<div>
|
||||
<div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">End every plan with a Confidence Assessment</div>
|
||||
<div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">Don’t wait to be asked. State your confidence, identify risks, flag uncertainties.</div>
|
||||
</div>
|
||||
</div>
|
||||
<div style="display: flex; gap: 16px; align-items: flex-start;">
|
||||
<span class="font-mono" style="font-size: 24px; font-weight: 300; color: var(--amber-500); flex-shrink: 0; width: 32px; text-align: right;">3</span>
|
||||
<div>
|
||||
<div style="font-size: 17px; font-weight: 500; color: var(--slate-900); margin-bottom: 4px;">Search for existing patterns before proposing new code</div>
|
||||
<div style="font-size: 14px; color: var(--slate-600); line-height: 1.5;">Explicitly state what you found in the codebase. Prefer reuse over new implementation.</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 6. Recurring phrases -->
|
||||
<div class="section">
|
||||
<div class="section-label">6. Your most-used phrases</div>
|
||||
<div class="phrases-grid">
|
||||
<div class="phrase-chip"><span class="phrase-text">"narrative"</span><span class="phrase-count">50+</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"I don't want this in the plan"</span><span class="phrase-count">10</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"feature branch"</span><span class="phrase-count">8+</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"confidence"</span><span class="phrase-count">8+</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"Missing overview"</span><span class="phrase-count">14</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"front-end design skill"</span><span class="phrase-count">16</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"separation of concerns"</span><span class="phrase-count">6</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"Take a step back, breathe"</span><span class="phrase-count">6</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"how does this work"</span><span class="phrase-count">5+</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"what the fuck"</span><span class="phrase-count">4</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"create a ticket"</span><span class="phrase-count">4+</span></div>
|
||||
<div class="phrase-chip"><span class="phrase-text">"reusable"</span><span class="phrase-count">19+</span></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 7. Corrective Prompt -->
|
||||
<div class="section" style="margin-bottom: 64px;">
|
||||
<div class="action-panel">
|
||||
<div class="section-label">7. The corrective prompt</div>
|
||||
<div class="ap-intro">
|
||||
These 17 instructions were extracted directly from your denial patterns. Embedding them in a planning prompt would address approximately 70% of all denial reasons.
|
||||
</div>
|
||||
<div class="prompt-block">
|
||||
<div class="prompt-header">
|
||||
<span class="prompt-header-label">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="4 17 10 11 4 5"></polyline><line x1="12" y1="19" x2="20" y2="19"></line></svg>
|
||||
planning-instructions.md
|
||||
</span>
|
||||
<button class="copy-btn" onclick="copyPrompt(this)">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg>
|
||||
Copy
|
||||
</button>
|
||||
</div>
|
||||
<div class="prompt-body">
|
||||
<pre id="prompt-content"><span class="comment"># Planning Instructions
|
||||
# Derived from 370 files of denial & annotation analysis</span>
|
||||
|
||||
1. STRUCTURE: Every plan MUST begin with a "Solution Overview"
|
||||
containing 2-3 paragraphs of narrative prose explaining:
|
||||
- What exists today (current state)
|
||||
- What will change and why
|
||||
- How it will be built (approach summary)
|
||||
Do NOT skip this. Do NOT replace it with bullet points.
|
||||
|
||||
2. NARRATIVE: Every major section must include a rationale
|
||||
paragraph — not just what will be done, but WHY this
|
||||
approach was chosen over alternatives.
|
||||
|
||||
3. FEATURE BRANCH: Always specify implementation will occur
|
||||
on a feature branch. State the branch name. Never plan
|
||||
to work directly on main.
|
||||
|
||||
4. EXISTING PATTERNS: Before proposing any new implementation,
|
||||
search the codebase for existing patterns that solve the
|
||||
same problem. Explicitly state what you found and whether
|
||||
you will reuse it. Prefer reuse over new code.
|
||||
|
||||
5. CONFIDENCE STATEMENT: End the plan with a "Confidence
|
||||
Assessment" section. State your confidence level, identify
|
||||
risks or edge cases, and note uncertainties. Do not wait
|
||||
to be asked.
|
||||
|
||||
6. PHASING: For plans with more than 3 steps, break them into
|
||||
numbered phases. After each phase, note "Pause for
|
||||
evaluation" so the reviewer can assess before proceeding.
|
||||
|
||||
7. ISSUE TRACKING: If the project uses Linear or GitHub Issues,
|
||||
include a step to create relevant tickets BEFORE
|
||||
implementation. Backlog items should be separate tickets.
|
||||
|
||||
8. SIMPLICITY: Choose the simplest approach that meets
|
||||
requirements. Do not introduce modifier keys when hover
|
||||
works. Do not build a framework when a README suffices.
|
||||
|
||||
9. NAMING: Use explicit, transparent names for user-facing
|
||||
features. Do not euphemize Git operations ("Git Add"
|
||||
not "Accept"). Match existing product naming conventions.
|
||||
|
||||
10. CODE QUALITY: State that implementation will follow clean
|
||||
code principles: modular architecture, separation of
|
||||
concerns, no circumventing lint or type checks.
|
||||
|
||||
11. CLEAN FOUNDATION: If the codebase has failing lint or type
|
||||
checks, address these BEFORE proposing new features. State
|
||||
the current CI/CD state.
|
||||
|
||||
12. PRIVACY: For features involving data storage or sharing,
|
||||
explicitly state privacy guarantees. Require user
|
||||
confirmation before storing data.
|
||||
|
||||
13. EXAMPLES: When the plan involves user-facing output or UI,
|
||||
include an example of what it will look like.
|
||||
|
||||
14. FOCUSED SCOPE: Do not include sections that are obvious,
|
||||
boilerplate, or previously asked to be removed. Keep the
|
||||
plan focused rather than comprehensive.
|
||||
|
||||
15. DESIGN SKILL: For any frontend/UI work, invoke the
|
||||
front-end design skill to validate the approach. Note
|
||||
this invocation explicitly in the plan.
|
||||
|
||||
16. VERIFICATION STEP: For refactors or multi-file changes,
|
||||
include a verification step with line-by-line comparison
|
||||
of affected code paths.
|
||||
|
||||
17. DELIBERATION: If the plan involves a dramatic shift, state
|
||||
that you have re-evaluated the approach, traced through
|
||||
affected files mentally, and are confident in the plan.
|
||||
Do not rush.</pre>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="glow-wrap">
|
||||
<div class="glow-bg"></div>
|
||||
<div class="glow-card">
|
||||
<div class="gc-text">
|
||||
These instructions are yours — derived from <em>your feedback, your language, your standards</em>. Copy them into your planning prompt and watch the deny rate drop.
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<footer>
|
||||
<p>Analysis of 202 denied plans and 168 annotation files from the Plannotator archive.</p>
|
||||
</footer>
|
||||
|
||||
</div>
|
||||
|
||||
<script>
|
||||
function copyPrompt(btn) {
|
||||
const text = document.getElementById('prompt-content').textContent;
|
||||
navigator.clipboard.writeText(text).then(() => {
|
||||
btn.innerHTML = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M22 11.08V12a10 10 0 1 1-5.93-9.14"></path><polyline points="22 4 12 14.01 9 11.01"></polyline></svg> Copied';
|
||||
btn.classList.add('copied');
|
||||
setTimeout(() => {
|
||||
btn.innerHTML = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg> Copy';
|
||||
btn.classList.remove('copied');
|
||||
}, 2000);
|
||||
});
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,282 @@
|
||||
# Claude Code Fallback
|
||||
|
||||
Read this file only when the user does **not** have a usable Plannotator archive.
|
||||
|
||||
This is the secondary path for ordinary Claude Code users whose denial history
|
||||
exists in `~/.claude/projects/` rather than `~/.plannotator/plans/`.
|
||||
|
||||
The goal is the same as the main skill:
|
||||
|
||||
- extract the user's real denial reasons
|
||||
- reduce them into a taxonomy and prompt corrections
|
||||
- produce the same HTML report design and section flow
|
||||
|
||||
## Source of Truth
|
||||
|
||||
Use the bundled parser at:
|
||||
|
||||
- [scripts/extract_exit_plan_mode_outcomes.py](../scripts/extract_exit_plan_mode_outcomes.py)
|
||||
|
||||
Resolve that script path relative to this skill directory before running it.
|
||||
|
||||
This script normalizes `ExitPlanMode` outcomes from Claude Code JSONL transcripts
|
||||
and emits clean JSON parts containing only human-authored denial reasons by default.
|
||||
|
||||
Do **not** read raw `~/.claude/projects/**/*.jsonl` directly unless:
|
||||
|
||||
- the parser fails
|
||||
- the user asks for audit-level verification
|
||||
- you need to inspect one or two suspicious records by hand
|
||||
|
||||
The parser exists specifically to strip transcript noise such as generic native
|
||||
reject strings and wrapper boilerplate.
|
||||
|
||||
## Run the Parser
|
||||
|
||||
Create the working directory first:
|
||||
|
||||
```bash
|
||||
mkdir -p /tmp/compound-planning
|
||||
```
|
||||
|
||||
Then run the bundled parser. Prefer `python3`; if unavailable, use `python`.
|
||||
|
||||
Use a resolved absolute script path, not a repo-local copy.
|
||||
|
||||
```bash
|
||||
python3 [RESOLVED SKILL PATH]/scripts/extract_exit_plan_mode_outcomes.py \
|
||||
--projects-dir ~/.claude/projects \
|
||||
--json-out /tmp/compound-planning/claude-code-human-reasons.json \
|
||||
--show-samples 0
|
||||
```
|
||||
|
||||
Expected output:
|
||||
|
||||
- manifest:
|
||||
`/tmp/compound-planning/claude-code-human-reasons/claude-code-human-reasons.manifest.json`
|
||||
- part files:
|
||||
`/tmp/compound-planning/claude-code-human-reasons/claude-code-human-reasons.part-XXXX-of-XXXX.json`
|
||||
|
||||
The script prints how many records were detected and how many JSON part files were emitted.
|
||||
|
||||
## What To Read First
|
||||
|
||||
Read the manifest before reading any part file.
|
||||
|
||||
The manifest gives you:
|
||||
|
||||
- total filtered record count
|
||||
- total `ExitPlanMode` attempts
|
||||
- native approval / denial counts
|
||||
- non-native denial counts
|
||||
- part file list
|
||||
|
||||
Use the part files only after you understand the overall dataset shape.
|
||||
|
||||
## Inventory In Fallback Mode
|
||||
|
||||
In Claude Code fallback mode, report this dataset instead of the Plannotator file counts:
|
||||
|
||||
- human denial reasons found
|
||||
- total `ExitPlanMode` attempts scanned
|
||||
- native approvals
|
||||
- native denials with extractable inline reason
|
||||
- native denials without recoverable reason
|
||||
- non-native denials with recoverable payload
|
||||
- number of emitted JSON parts
|
||||
- date range from the records
|
||||
- total days spanned
|
||||
- distinct sessions
|
||||
- distinct project roots / `cwd` values
|
||||
|
||||
Also calculate:
|
||||
|
||||
- average `plan_length_chars` where present
|
||||
- percentage of all denials that contain a recoverable human reason
|
||||
|
||||
Do **not** fabricate Plannotator-only inventory fields in fallback mode:
|
||||
|
||||
- no `*-approved.md` counts
|
||||
- no `*.annotations.md` counts
|
||||
- no `*.diff.md` counts
|
||||
- no approved-plan line-count analysis
|
||||
|
||||
If the user asks for those specifically, state that Claude Code log fallback mode
|
||||
does not contain those artifacts.
|
||||
|
||||
### Previous Report Detection In Fallback Mode
|
||||
|
||||
Previous report detection still applies. Check the user's home directory or
|
||||
`~/.plannotator/plans/` for existing `compound-planning-report*.html` files. If
|
||||
found, offer the same incremental vs full choice as Plannotator mode. In
|
||||
incremental mode, filter the parser output by timestamp rather than by filename
|
||||
date — use the `timestamp` field in each JSON record.
|
||||
|
||||
If no previous report exists, use the first-report naming convention
|
||||
(`compound-planning-report.html`). Otherwise use the next version number.
|
||||
|
||||
## Extraction In Fallback Mode
|
||||
|
||||
Treat the emitted JSON part files as the clean source dataset.
|
||||
|
||||
### Batching
|
||||
|
||||
- **Small datasets (< 200 records):** read the part files directly without extra agents
|
||||
- **Medium datasets (200-800 records):** split by part file or time range into 2-4 agents
|
||||
- **Large datasets (800+ records):** split by part file groups or balanced time ranges
|
||||
|
||||
All extraction agents should use `model: "haiku"` — they're doing straightforward
|
||||
file reading and structured extraction, not reasoning.
|
||||
|
||||
Each extraction agent should read every record in its assigned part files and write
|
||||
clean markdown output to:
|
||||
|
||||
```text
|
||||
/tmp/compound-planning/extraction-{batch-name}.md
|
||||
```
|
||||
|
||||
### Extraction Prompt For Claude Code Denial Records
|
||||
|
||||
Use this prompt for each fallback extraction batch (adapt the part files and output path):
|
||||
|
||||
```text
|
||||
You are extracting structured data from Claude Code ExitPlanMode denial records.
|
||||
|
||||
Files to read: [JSON PART FILES]
|
||||
Output: Write your complete results to [OUTPUT FILE PATH]
|
||||
|
||||
Read EVERY record in the assigned files. Each record already contains a cleaned
|
||||
human_reason field. Use that as the primary source text.
|
||||
|
||||
For EACH record, extract:
|
||||
- Date
|
||||
- Session ID
|
||||
- Project / cwd
|
||||
- Topic (only if inferable from the reason or plan path; otherwise say "Unknown from logs")
|
||||
- Human denial reason
|
||||
- What was specifically asked to change
|
||||
- Feedback type (let the content determine the category)
|
||||
- Notable phrases
|
||||
- Reason source (`native_inline_reason`, `non_native_freeform_payload`, or `structured_quote_extraction`)
|
||||
- Plan path if present
|
||||
- Plan length in chars if present
|
||||
|
||||
Do NOT skip any records. One entry per record.
|
||||
|
||||
Format each entry as:
|
||||
**[session_id :: tool_use_id]**
|
||||
- Date: ...
|
||||
- Project: ...
|
||||
- Topic: ...
|
||||
- Human denial reason: ...
|
||||
- Feedback type: ...
|
||||
- Specific asks: ...
|
||||
- Notable phrases: ...
|
||||
- Reason source: ...
|
||||
- Plan path: ...
|
||||
- Plan length chars: ...
|
||||
---
|
||||
|
||||
After processing all records, write the complete results to [OUTPUT FILE PATH].
|
||||
State the total record count at the end of the file.
|
||||
```
|
||||
|
||||
## Reduction In Fallback Mode
|
||||
|
||||
The reduction step stays conceptually the same:
|
||||
|
||||
- taxonomy
|
||||
- top patterns
|
||||
- recurring phrases
|
||||
- reviewer values
|
||||
- recurring agent mistakes
|
||||
- structural requests
|
||||
- evolution over time
|
||||
- corrective prompt instructions
|
||||
|
||||
Use `model: "sonnet"` for reduction agents, same as Plannotator mode. The
|
||||
two-stage reduce (partial reduces for 21+ extraction files) also applies when
|
||||
there are many part files.
|
||||
|
||||
But interpret the dataset correctly:
|
||||
|
||||
- this is denial-reason evidence from Claude Code logs
|
||||
- not every denial has a recoverable human reason
|
||||
- annotations may be absent entirely
|
||||
- success traits are often inferred from the inverse of repeated denial feedback
|
||||
|
||||
If the evidence for "what works" is weaker than the evidence for "what fails",
|
||||
say that explicitly.
|
||||
|
||||
## HTML Report Adaptation
|
||||
|
||||
Use the same template and the same section order as the main skill.
|
||||
|
||||
In fallback mode:
|
||||
|
||||
- explicitly state in the header/meta that the source is Claude Code `ExitPlanMode`
|
||||
denial reasons
|
||||
- keep the same narrative-first editorial style
|
||||
- keep the same 7 major sections
|
||||
- use real denial-reason counts, dates, phrases, and percentages only
|
||||
|
||||
### KPI Sidebar Substitutes
|
||||
|
||||
The Plannotator version uses a revision-rate KPI that may not exist here.
|
||||
|
||||
In fallback mode, prefer this KPI trio:
|
||||
|
||||
1. top denial category percentage
|
||||
2. total human denial reasons recovered
|
||||
3. number of distinct denial categories
|
||||
|
||||
If a better third metric emerges from the data, use it, but do not invent one.
|
||||
|
||||
### Footer / Provenance
|
||||
|
||||
The footer tagline should mention that the report was derived from Claude Code
|
||||
denial reasons rather than Plannotator markdown archives.
|
||||
|
||||
### Important Limitation To State
|
||||
|
||||
If `human_reasons_total < total denials`, mention in the narrative or footer note
|
||||
that some denials in the transcript did not contain recoverable human-authored
|
||||
feedback and therefore could not contribute to the pattern analysis.
|
||||
|
||||
### Versioned Report Naming
|
||||
|
||||
Versioned naming (`v2`, `v3`, etc.) applies to fallback mode too. Save reports
|
||||
to `~/.plannotator/plans/` (create the directory if it doesn't exist) so that
|
||||
all compound planning reports live in the same location regardless of data source.
|
||||
|
||||
## Summary In Fallback Mode
|
||||
|
||||
At the end, tell the user:
|
||||
|
||||
- how many human denial reasons were analyzed
|
||||
- how many total `ExitPlanMode` attempts were scanned
|
||||
- the top 3 denial patterns found
|
||||
- the estimated percentage of denial reasons the corrective instructions address
|
||||
- the single most impactful prompt improvement
|
||||
- where the report was saved (including version number)
|
||||
- if incremental: note that earlier findings are in the previous report
|
||||
|
||||
## Improvement Hook In Fallback Mode
|
||||
|
||||
The Phase 6 improvement hook applies to fallback mode too. The corrective prompt
|
||||
instructions derived from Claude Code denial reasons are just as useful for
|
||||
injection into future planning sessions. Follow the same flow as the main skill.
|
||||
|
||||
## Audit Mode
|
||||
|
||||
Only if the user asks for raw denial records or transcript noise:
|
||||
|
||||
```bash
|
||||
python3 [RESOLVED SKILL PATH]/scripts/extract_exit_plan_mode_outcomes.py \
|
||||
--projects-dir ~/.claude/projects \
|
||||
--records-filter denials \
|
||||
--json-out /tmp/compound-planning/claude-code-all-denials.json \
|
||||
--show-samples 0
|
||||
```
|
||||
|
||||
Do not use this audit-mode output for the normal report unless the user asks for it.
|
||||
@@ -0,0 +1,820 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Extract ExitPlanMode outcomes from Claude Code JSONL session logs.
|
||||
|
||||
This parser keeps three views of the same data:
|
||||
|
||||
1. Strict native Claude Code classification
|
||||
- native approval:
|
||||
"User has approved your plan."
|
||||
- native denial:
|
||||
"The user doesn't want to proceed with this tool use. The tool use was rejected"
|
||||
|
||||
2. General denial capture
|
||||
- any matching ExitPlanMode tool_result with is_error=true and non-empty text
|
||||
is captured as a denial/error payload, even when it is custom hook output
|
||||
or some other non-native integration.
|
||||
|
||||
3. Human-reason extraction
|
||||
- native inline reasons are preserved as-is
|
||||
- freeform non-native error payloads are treated as human reasons
|
||||
- structured non-native payloads are reduced to quoted feedback where possible
|
||||
|
||||
This means the script does not depend on hook-specific strings to capture custom
|
||||
denials, but it also does not dump wrapper boilerplate into the human-reason
|
||||
output.
|
||||
|
||||
The script streams JSONL line-by-line and uses only the Python standard library.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from dataclasses import asdict, dataclass
|
||||
from pathlib import Path
|
||||
from typing import Dict, Iterable, Iterator, List, Optional, Tuple
|
||||
|
||||
|
||||
APPROVE_PREFIX = "User has approved your plan."
|
||||
REJECT_PREFIX = (
|
||||
"The user doesn't want to proceed with this tool use. "
|
||||
"The tool use was rejected"
|
||||
)
|
||||
REASON_MARKER = "To tell you how to proceed, the user said:\n"
|
||||
NOTE_MARKER = (
|
||||
"\n\nNote: The user's next message may contain a correction or preference."
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class AttemptRecord:
|
||||
session_id: str
|
||||
tool_use_id: str
|
||||
file_path: str
|
||||
line_number: int
|
||||
timestamp: Optional[str]
|
||||
cwd: Optional[str]
|
||||
plan_file_path: Optional[str]
|
||||
plan_length_chars: Optional[int]
|
||||
outcome: str = "pending"
|
||||
native_reason: Optional[str] = None
|
||||
native_reason_style: Optional[str] = None
|
||||
captured_reason: Optional[str] = None
|
||||
captured_reason_style: Optional[str] = None
|
||||
captured_reason_source: Optional[str] = None
|
||||
human_reason: Optional[str] = None
|
||||
human_reason_style: Optional[str] = None
|
||||
human_reason_source: Optional[str] = None
|
||||
result_is_error: Optional[bool] = None
|
||||
result_file_path: Optional[str] = None
|
||||
result_line_number: Optional[int] = None
|
||||
result_timestamp: Optional[str] = None
|
||||
result_preview: Optional[str] = None
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Extract ExitPlanMode approvals/denials from Claude Code logs."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--projects-dir",
|
||||
default="~/.claude/projects",
|
||||
help="Root Claude projects directory. Default: %(default)s",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--include-subagents",
|
||||
action="store_true",
|
||||
help="Include /subagents/ JSONL files. Default is to skip them.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--records-filter",
|
||||
choices=("all", "native", "native-denials", "denials", "human-reasons"),
|
||||
default="human-reasons",
|
||||
help=(
|
||||
"Which records to write to JSON/CSV outputs. "
|
||||
"Default: %(default)s"
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--include-non-native-denials",
|
||||
action="store_true",
|
||||
help=(
|
||||
"Include non-native denial/error payloads in sample output. "
|
||||
"Default sample output shows only native denials."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--show-samples",
|
||||
type=int,
|
||||
default=5,
|
||||
help="How many denial samples to print in the text summary.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json-out",
|
||||
help="Optional path to write a JSON report.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-output-tokens-per-file",
|
||||
type=int,
|
||||
default=50000,
|
||||
help=(
|
||||
"Approximate max token budget per JSON file when writing --json-out. "
|
||||
"Default: %(default)s"
|
||||
),
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def iter_jsonl_files(root: Path, include_subagents: bool) -> Iterator[Path]:
|
||||
for dirpath, dirnames, filenames in os.walk(root):
|
||||
if not include_subagents and "subagents" in dirnames:
|
||||
dirnames.remove("subagents")
|
||||
dirnames.sort()
|
||||
for filename in sorted(filenames):
|
||||
if filename.endswith(".jsonl"):
|
||||
yield Path(dirpath) / filename
|
||||
|
||||
|
||||
def make_attempt_key(session_id: str, tool_use_id: str) -> str:
|
||||
return session_id + "::" + tool_use_id
|
||||
|
||||
|
||||
def preview(text: str, limit: int = 220) -> str:
|
||||
compact = " ".join(text.split())
|
||||
if len(compact) <= limit:
|
||||
return compact
|
||||
return compact[: limit - 3] + "..."
|
||||
|
||||
|
||||
def estimate_tokens(text: str) -> int:
|
||||
# Rough enough for output chunking. We intentionally bias slightly high.
|
||||
return max(1, (len(text) + 3) // 4)
|
||||
|
||||
|
||||
def iter_blocks(message_content: object) -> Iterator[dict]:
|
||||
if not isinstance(message_content, list):
|
||||
return
|
||||
for block in message_content:
|
||||
if isinstance(block, dict):
|
||||
yield block
|
||||
|
||||
|
||||
def extract_text(content: object) -> str:
|
||||
if isinstance(content, str):
|
||||
return content
|
||||
if not isinstance(content, list):
|
||||
return ""
|
||||
|
||||
parts: List[str] = []
|
||||
for item in content:
|
||||
if isinstance(item, str):
|
||||
parts.append(item)
|
||||
continue
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
if isinstance(item.get("text"), str):
|
||||
parts.append(item["text"])
|
||||
elif isinstance(item.get("content"), str):
|
||||
parts.append(item["content"])
|
||||
return "\n".join(part for part in parts if part)
|
||||
|
||||
|
||||
def classify_reason_style(reason: Optional[str]) -> Optional[str]:
|
||||
if not reason:
|
||||
return None
|
||||
|
||||
stripped = reason.lstrip()
|
||||
if (
|
||||
stripped.startswith("#")
|
||||
or stripped.startswith("YOUR PLAN WAS NOT APPROVED.")
|
||||
or "\n## " in reason
|
||||
or "\n---" in reason
|
||||
):
|
||||
return "structured"
|
||||
return "freeform"
|
||||
|
||||
|
||||
def extract_blockquote_feedback(text: str) -> List[str]:
|
||||
quotes: List[str] = []
|
||||
current: List[str] = []
|
||||
|
||||
for raw_line in text.splitlines():
|
||||
stripped = raw_line.strip()
|
||||
if stripped.startswith(">"):
|
||||
current.append(stripped[1:].lstrip())
|
||||
continue
|
||||
|
||||
if current:
|
||||
if not stripped or stripped.startswith("## ") or stripped == "---":
|
||||
quote = "\n".join(line for line in current if line).strip()
|
||||
if quote:
|
||||
quotes.append(quote)
|
||||
current = []
|
||||
continue
|
||||
|
||||
# Preserve wrapped continuation lines that belong to the same quote.
|
||||
current.append(stripped)
|
||||
|
||||
if current:
|
||||
quote = "\n".join(line for line in current if line).strip()
|
||||
if quote:
|
||||
quotes.append(quote)
|
||||
|
||||
return quotes
|
||||
|
||||
|
||||
def extract_human_reason(
|
||||
native_reason: Optional[str],
|
||||
captured_reason: Optional[str],
|
||||
captured_reason_style: Optional[str],
|
||||
) -> Tuple[Optional[str], Optional[str], Optional[str]]:
|
||||
if native_reason:
|
||||
return (
|
||||
native_reason,
|
||||
classify_reason_style(native_reason),
|
||||
"native_inline_reason",
|
||||
)
|
||||
|
||||
if not captured_reason:
|
||||
return (None, None, None)
|
||||
|
||||
if captured_reason_style == "freeform":
|
||||
return (
|
||||
captured_reason,
|
||||
classify_reason_style(captured_reason),
|
||||
"non_native_freeform_payload",
|
||||
)
|
||||
|
||||
quote_feedback = extract_blockquote_feedback(captured_reason)
|
||||
if quote_feedback:
|
||||
reason = "\n\n".join(quote_feedback)
|
||||
return (
|
||||
reason,
|
||||
classify_reason_style(reason),
|
||||
"structured_quote_extraction",
|
||||
)
|
||||
|
||||
return (None, None, None)
|
||||
|
||||
|
||||
def classify_result(
|
||||
text: str,
|
||||
is_error: bool,
|
||||
) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
|
||||
stripped = text.strip()
|
||||
if not stripped:
|
||||
if is_error:
|
||||
return (
|
||||
"denied_non_native_no_payload",
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
)
|
||||
return ("pending", None, None, None, None)
|
||||
|
||||
if stripped.startswith(APPROVE_PREFIX):
|
||||
return ("approved_native", None, None, None, None)
|
||||
|
||||
if stripped.startswith(REJECT_PREFIX):
|
||||
marker_index = stripped.find(REASON_MARKER)
|
||||
if marker_index < 0:
|
||||
return ("denied_native_no_reason", None, None, None, None)
|
||||
|
||||
reason = stripped[marker_index + len(REASON_MARKER) :]
|
||||
note_index = reason.find(NOTE_MARKER)
|
||||
if note_index >= 0:
|
||||
reason = reason[:note_index]
|
||||
reason = reason.strip()
|
||||
if reason:
|
||||
style = classify_reason_style(reason)
|
||||
return (
|
||||
"denied_native_with_reason",
|
||||
reason,
|
||||
reason,
|
||||
"native_inline_reason",
|
||||
style,
|
||||
)
|
||||
return ("denied_native_no_reason", None, None, None, None)
|
||||
|
||||
if is_error:
|
||||
style = classify_reason_style(stripped)
|
||||
return (
|
||||
"denied_non_native_with_payload",
|
||||
None,
|
||||
stripped,
|
||||
"non_native_error_payload",
|
||||
style,
|
||||
)
|
||||
|
||||
return ("non_native_other", None, None, None, None)
|
||||
|
||||
|
||||
def outcome_rank(outcome: str) -> int:
|
||||
ranks = {
|
||||
"pending": 0,
|
||||
"non_native_other": 1,
|
||||
"approved_native": 2,
|
||||
"denied_native_no_reason": 3,
|
||||
"denied_native_with_reason": 4,
|
||||
"denied_non_native_no_payload": 5,
|
||||
"denied_non_native_with_payload": 6,
|
||||
}
|
||||
return ranks.get(outcome, 0)
|
||||
|
||||
|
||||
def update_attempt_from_result(
|
||||
attempt: AttemptRecord,
|
||||
file_path: Path,
|
||||
line_number: int,
|
||||
timestamp: Optional[str],
|
||||
text: str,
|
||||
is_error: bool,
|
||||
) -> None:
|
||||
(
|
||||
outcome,
|
||||
native_reason,
|
||||
captured_reason,
|
||||
captured_reason_source,
|
||||
captured_reason_style,
|
||||
) = classify_result(text=text, is_error=is_error)
|
||||
if outcome_rank(outcome) < outcome_rank(attempt.outcome):
|
||||
return
|
||||
|
||||
attempt.outcome = outcome
|
||||
attempt.native_reason = native_reason
|
||||
attempt.native_reason_style = classify_reason_style(native_reason)
|
||||
attempt.captured_reason = captured_reason
|
||||
attempt.captured_reason_source = captured_reason_source
|
||||
attempt.captured_reason_style = captured_reason_style
|
||||
(
|
||||
attempt.human_reason,
|
||||
attempt.human_reason_style,
|
||||
attempt.human_reason_source,
|
||||
) = extract_human_reason(
|
||||
native_reason=native_reason,
|
||||
captured_reason=captured_reason,
|
||||
captured_reason_style=captured_reason_style,
|
||||
)
|
||||
attempt.result_is_error = is_error
|
||||
attempt.result_file_path = str(file_path)
|
||||
attempt.result_line_number = line_number
|
||||
attempt.result_timestamp = timestamp
|
||||
attempt.result_preview = preview(text)
|
||||
|
||||
|
||||
def scan_projects(
|
||||
projects_dir: Path,
|
||||
include_subagents: bool,
|
||||
) -> Tuple[Dict[str, int], List[AttemptRecord]]:
|
||||
stats = {
|
||||
"files_scanned": 0,
|
||||
"lines_scanned": 0,
|
||||
"json_errors": 0,
|
||||
}
|
||||
attempts: Dict[str, AttemptRecord] = {}
|
||||
|
||||
for file_path in iter_jsonl_files(projects_dir, include_subagents):
|
||||
stats["files_scanned"] += 1
|
||||
try:
|
||||
handle = file_path.open("r", encoding="utf-8", errors="replace")
|
||||
except OSError:
|
||||
continue
|
||||
|
||||
with handle:
|
||||
for line_number, raw_line in enumerate(handle, start=1):
|
||||
if not raw_line.strip():
|
||||
continue
|
||||
stats["lines_scanned"] += 1
|
||||
try:
|
||||
obj = json.loads(raw_line)
|
||||
except json.JSONDecodeError:
|
||||
stats["json_errors"] += 1
|
||||
continue
|
||||
|
||||
session_id = str(obj.get("sessionId") or str(file_path))
|
||||
timestamp = obj.get("timestamp")
|
||||
cwd = obj.get("cwd")
|
||||
message = obj.get("message")
|
||||
if not isinstance(message, dict):
|
||||
continue
|
||||
|
||||
content = message.get("content")
|
||||
|
||||
for block in iter_blocks(content):
|
||||
if (
|
||||
block.get("type") == "tool_use"
|
||||
and block.get("name") == "ExitPlanMode"
|
||||
and isinstance(block.get("id"), str)
|
||||
):
|
||||
tool_use_id = block["id"]
|
||||
key = make_attempt_key(session_id, tool_use_id)
|
||||
if key in attempts:
|
||||
continue
|
||||
input_data = block.get("input")
|
||||
plan = None
|
||||
plan_file_path = None
|
||||
if isinstance(input_data, dict):
|
||||
if isinstance(input_data.get("plan"), str):
|
||||
plan = input_data["plan"]
|
||||
if isinstance(input_data.get("planFilePath"), str):
|
||||
plan_file_path = input_data["planFilePath"]
|
||||
|
||||
attempts[key] = AttemptRecord(
|
||||
session_id=session_id,
|
||||
tool_use_id=tool_use_id,
|
||||
file_path=str(file_path),
|
||||
line_number=line_number,
|
||||
timestamp=timestamp if isinstance(timestamp, str) else None,
|
||||
cwd=cwd if isinstance(cwd, str) else None,
|
||||
plan_file_path=plan_file_path,
|
||||
plan_length_chars=len(plan) if isinstance(plan, str) else None,
|
||||
)
|
||||
|
||||
if message.get("role") != "user":
|
||||
continue
|
||||
|
||||
for block in iter_blocks(content):
|
||||
if (
|
||||
block.get("type") != "tool_result"
|
||||
or not isinstance(block.get("tool_use_id"), str)
|
||||
):
|
||||
continue
|
||||
|
||||
key = make_attempt_key(session_id, block["tool_use_id"])
|
||||
attempt = attempts.get(key)
|
||||
if attempt is None:
|
||||
continue
|
||||
|
||||
text = extract_text(block.get("content"))
|
||||
update_attempt_from_result(
|
||||
attempt=attempt,
|
||||
file_path=file_path,
|
||||
line_number=line_number,
|
||||
timestamp=timestamp if isinstance(timestamp, str) else None,
|
||||
text=text,
|
||||
is_error=bool(block.get("is_error")),
|
||||
)
|
||||
|
||||
return stats, list(attempts.values())
|
||||
|
||||
|
||||
def summarize(attempts: Iterable[AttemptRecord]) -> Dict[str, int]:
|
||||
summary = {
|
||||
"total_exit_plan_attempts": 0,
|
||||
"approved_native": 0,
|
||||
"denied_native_with_reason": 0,
|
||||
"denied_native_no_reason": 0,
|
||||
"denied_native_with_freeform_reason": 0,
|
||||
"denied_native_with_structured_reason": 0,
|
||||
"denied_non_native_with_payload": 0,
|
||||
"denied_non_native_no_payload": 0,
|
||||
"captured_denial_reasons_total": 0,
|
||||
"captured_freeform_reasons": 0,
|
||||
"captured_structured_reasons": 0,
|
||||
"human_reasons_total": 0,
|
||||
"human_reasons_native": 0,
|
||||
"human_reasons_non_native": 0,
|
||||
"human_reasons_freeform": 0,
|
||||
"human_reasons_structured": 0,
|
||||
"non_native_other": 0,
|
||||
"pending": 0,
|
||||
}
|
||||
for attempt in attempts:
|
||||
summary["total_exit_plan_attempts"] += 1
|
||||
summary[attempt.outcome] = summary.get(attempt.outcome, 0) + 1
|
||||
if attempt.outcome == "denied_native_with_reason":
|
||||
if attempt.native_reason_style == "freeform":
|
||||
summary["denied_native_with_freeform_reason"] += 1
|
||||
elif attempt.native_reason_style == "structured":
|
||||
summary["denied_native_with_structured_reason"] += 1
|
||||
if attempt.captured_reason:
|
||||
summary["captured_denial_reasons_total"] += 1
|
||||
if attempt.captured_reason_style == "freeform":
|
||||
summary["captured_freeform_reasons"] += 1
|
||||
elif attempt.captured_reason_style == "structured":
|
||||
summary["captured_structured_reasons"] += 1
|
||||
if attempt.human_reason:
|
||||
summary["human_reasons_total"] += 1
|
||||
if attempt.human_reason_source == "native_inline_reason":
|
||||
summary["human_reasons_native"] += 1
|
||||
else:
|
||||
summary["human_reasons_non_native"] += 1
|
||||
if attempt.human_reason_style == "freeform":
|
||||
summary["human_reasons_freeform"] += 1
|
||||
elif attempt.human_reason_style == "structured":
|
||||
summary["human_reasons_structured"] += 1
|
||||
return summary
|
||||
|
||||
|
||||
def filter_records(
|
||||
attempts: List[AttemptRecord],
|
||||
records_filter: str,
|
||||
) -> List[AttemptRecord]:
|
||||
if records_filter == "all":
|
||||
return attempts
|
||||
if records_filter == "native":
|
||||
return [
|
||||
attempt
|
||||
for attempt in attempts
|
||||
if attempt.outcome.startswith("approved_native")
|
||||
or attempt.outcome.startswith("denied_native")
|
||||
]
|
||||
if records_filter == "native-denials":
|
||||
return [
|
||||
attempt
|
||||
for attempt in attempts
|
||||
if attempt.outcome.startswith("denied_native")
|
||||
]
|
||||
if records_filter == "human-reasons":
|
||||
return [attempt for attempt in attempts if attempt.human_reason]
|
||||
return [
|
||||
attempt
|
||||
for attempt in attempts
|
||||
if attempt.outcome.startswith("denied_native")
|
||||
or attempt.outcome.startswith("denied_non_native")
|
||||
]
|
||||
|
||||
|
||||
def build_json_chunks(
|
||||
records: List[AttemptRecord],
|
||||
max_output_tokens_per_file: int,
|
||||
) -> List[List[AttemptRecord]]:
|
||||
if not records:
|
||||
return [[]]
|
||||
|
||||
chunks: List[List[AttemptRecord]] = []
|
||||
current_chunk: List[AttemptRecord] = []
|
||||
current_tokens = 0
|
||||
|
||||
for record in records:
|
||||
record_dict = asdict(record)
|
||||
record_json = json.dumps(record_dict, ensure_ascii=False)
|
||||
record_tokens = estimate_tokens(record_json)
|
||||
|
||||
if current_chunk and current_tokens + record_tokens > max_output_tokens_per_file:
|
||||
chunks.append(current_chunk)
|
||||
current_chunk = []
|
||||
current_tokens = 0
|
||||
|
||||
current_chunk.append(record)
|
||||
current_tokens += record_tokens
|
||||
|
||||
if current_chunk:
|
||||
chunks.append(current_chunk)
|
||||
|
||||
return chunks
|
||||
|
||||
|
||||
def print_summary(
|
||||
projects_dir: Path,
|
||||
include_subagents: bool,
|
||||
stats: Dict[str, int],
|
||||
attempts: List[AttemptRecord],
|
||||
summary: Dict[str, int],
|
||||
show_samples: int,
|
||||
include_non_native_denials: bool,
|
||||
) -> None:
|
||||
native_denials = (
|
||||
summary["denied_native_with_reason"] + summary["denied_native_no_reason"]
|
||||
)
|
||||
total_denials = (
|
||||
native_denials
|
||||
+ summary["denied_non_native_with_payload"]
|
||||
+ summary["denied_non_native_no_payload"]
|
||||
)
|
||||
native_extractable_ratio = (
|
||||
(summary["denied_native_with_reason"] / native_denials) * 100.0
|
||||
if native_denials
|
||||
else 0.0
|
||||
)
|
||||
all_capture_ratio = (
|
||||
(summary["captured_denial_reasons_total"] / total_denials) * 100.0
|
||||
if total_denials
|
||||
else 0.0
|
||||
)
|
||||
|
||||
print(f"Projects dir: {projects_dir}")
|
||||
print(f"Included subagents: {'yes' if include_subagents else 'no'}")
|
||||
print(f"JSONL files scanned: {stats['files_scanned']}")
|
||||
print(f"JSON lines scanned: {stats['lines_scanned']}")
|
||||
print(f"JSON parse errors: {stats['json_errors']}")
|
||||
print()
|
||||
print(f"ExitPlanMode attempts: {summary['total_exit_plan_attempts']}")
|
||||
print(f"Native approvals: {summary['approved_native']}")
|
||||
print(
|
||||
"Native denials with extractable reason: "
|
||||
f"{summary['denied_native_with_reason']}"
|
||||
)
|
||||
print(
|
||||
"Native denials without reason: "
|
||||
f"{summary['denied_native_no_reason']}"
|
||||
)
|
||||
print(
|
||||
"Freeform native reasons: "
|
||||
f"{summary['denied_native_with_freeform_reason']}"
|
||||
)
|
||||
print(
|
||||
"Structured native reasons: "
|
||||
f"{summary['denied_native_with_structured_reason']}"
|
||||
)
|
||||
print(
|
||||
"Non-native denials with payload: "
|
||||
f"{summary['denied_non_native_with_payload']}"
|
||||
)
|
||||
print(
|
||||
"Non-native denials without payload: "
|
||||
f"{summary['denied_non_native_no_payload']}"
|
||||
)
|
||||
print(
|
||||
"Captured denial reasons total: "
|
||||
f"{summary['captured_denial_reasons_total']}"
|
||||
)
|
||||
print(
|
||||
"Captured freeform reasons: "
|
||||
f"{summary['captured_freeform_reasons']}"
|
||||
)
|
||||
print(
|
||||
"Captured structured reasons: "
|
||||
f"{summary['captured_structured_reasons']}"
|
||||
)
|
||||
print(f"Human reasons total: {summary['human_reasons_total']}")
|
||||
print(f"Human reasons from native denials: {summary['human_reasons_native']}")
|
||||
print(
|
||||
"Human reasons from non-native denials: "
|
||||
f"{summary['human_reasons_non_native']}"
|
||||
)
|
||||
print(
|
||||
"Non-native / non-denial outcomes: "
|
||||
f"{summary['non_native_other']}"
|
||||
)
|
||||
print(f"Pending / unmatched attempts: {summary['pending']}")
|
||||
print()
|
||||
print(
|
||||
"Extractable native denial reasons: "
|
||||
f"{summary['denied_native_with_reason']}/{native_denials} "
|
||||
f"({native_extractable_ratio:.1f}%)"
|
||||
)
|
||||
print(
|
||||
"Captured denial payloads across all denial types: "
|
||||
f"{summary['captured_denial_reasons_total']}/{total_denials} "
|
||||
f"({all_capture_ratio:.1f}%)"
|
||||
)
|
||||
print(
|
||||
"Human reasons across all denial types: "
|
||||
f"{summary['human_reasons_total']}/{total_denials} "
|
||||
f"({((summary['human_reasons_total'] / total_denials) * 100.0 if total_denials else 0.0):.1f}%)"
|
||||
)
|
||||
|
||||
if include_non_native_denials:
|
||||
samples = [attempt for attempt in attempts if attempt.human_reason]
|
||||
else:
|
||||
samples = [
|
||||
attempt
|
||||
for attempt in attempts
|
||||
if attempt.outcome == "denied_native_with_reason" and attempt.human_reason
|
||||
]
|
||||
samples = samples[: max(show_samples, 0)]
|
||||
if not samples:
|
||||
return
|
||||
|
||||
print()
|
||||
print(
|
||||
"Sample denial reasons:"
|
||||
if include_non_native_denials
|
||||
else "Sample native denial reasons:"
|
||||
)
|
||||
for attempt in samples:
|
||||
style = attempt.human_reason_style or "unknown"
|
||||
source = attempt.human_reason_source or "unknown"
|
||||
reason = attempt.human_reason or ""
|
||||
print(
|
||||
"- "
|
||||
f"[{attempt.outcome} / {source} / {style}] "
|
||||
f"{reason!r} "
|
||||
f"({attempt.file_path}:{attempt.result_line_number})"
|
||||
)
|
||||
|
||||
|
||||
def write_json_report(
|
||||
output_path: Path,
|
||||
projects_dir: Path,
|
||||
include_subagents: bool,
|
||||
stats: Dict[str, int],
|
||||
summary: Dict[str, int],
|
||||
records: List[AttemptRecord],
|
||||
max_output_tokens_per_file: int,
|
||||
) -> List[Path]:
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
chunks = build_json_chunks(records, max_output_tokens_per_file)
|
||||
base_name = output_path.stem
|
||||
output_dir = output_path.with_suffix("")
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
written_files: List[Path] = []
|
||||
part_summaries = []
|
||||
|
||||
for index, chunk in enumerate(chunks, start=1):
|
||||
chunk_records = [asdict(record) for record in chunk]
|
||||
chunk_payload = {
|
||||
"projects_dir": str(projects_dir),
|
||||
"include_subagents": include_subagents,
|
||||
"stats": stats,
|
||||
"summary": summary,
|
||||
"part_index": index,
|
||||
"part_count": len(chunks),
|
||||
"record_count": len(chunk_records),
|
||||
"records": chunk_records,
|
||||
}
|
||||
part_name = f"{base_name}.part-{index:04d}-of-{len(chunks):04d}.json"
|
||||
part_path = output_dir / part_name
|
||||
part_path.write_text(
|
||||
json.dumps(chunk_payload, indent=2, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
written_files.append(part_path)
|
||||
part_summaries.append(
|
||||
{
|
||||
"part_index": index,
|
||||
"file_name": part_name,
|
||||
"record_count": len(chunk_records),
|
||||
}
|
||||
)
|
||||
|
||||
manifest_payload = {
|
||||
"projects_dir": str(projects_dir),
|
||||
"include_subagents": include_subagents,
|
||||
"stats": stats,
|
||||
"summary": summary,
|
||||
"records_filter_record_count": len(records),
|
||||
"part_count": len(chunks),
|
||||
"max_output_tokens_per_file": max_output_tokens_per_file,
|
||||
"parts": part_summaries,
|
||||
}
|
||||
manifest_path = output_dir / f"{base_name}.manifest.json"
|
||||
manifest_path.write_text(
|
||||
json.dumps(manifest_payload, indent=2, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
written_files.insert(0, manifest_path)
|
||||
|
||||
return written_files
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
projects_dir = Path(args.projects_dir).expanduser()
|
||||
if not projects_dir.exists():
|
||||
print(f"Projects dir does not exist: {projects_dir}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
stats, attempts = scan_projects(
|
||||
projects_dir=projects_dir,
|
||||
include_subagents=args.include_subagents,
|
||||
)
|
||||
attempts.sort(
|
||||
key=lambda attempt: (
|
||||
attempt.file_path,
|
||||
attempt.line_number,
|
||||
attempt.tool_use_id,
|
||||
)
|
||||
)
|
||||
summary = summarize(attempts)
|
||||
records = filter_records(attempts, args.records_filter)
|
||||
|
||||
print_summary(
|
||||
projects_dir=projects_dir,
|
||||
include_subagents=args.include_subagents,
|
||||
stats=stats,
|
||||
attempts=attempts,
|
||||
summary=summary,
|
||||
show_samples=args.show_samples,
|
||||
include_non_native_denials=args.include_non_native_denials,
|
||||
)
|
||||
|
||||
if args.json_out:
|
||||
written_files = write_json_report(
|
||||
output_path=Path(args.json_out).expanduser(),
|
||||
projects_dir=projects_dir,
|
||||
include_subagents=args.include_subagents,
|
||||
stats=stats,
|
||||
summary=summary,
|
||||
records=records,
|
||||
max_output_tokens_per_file=args.max_output_tokens_per_file,
|
||||
)
|
||||
part_count = max(len(written_files) - 1, 0)
|
||||
print()
|
||||
print(
|
||||
"Wrote JSON output: "
|
||||
f"detected {len(records)} records for filter '{args.records_filter}' "
|
||||
f"and emitted {part_count} part file(s) plus a manifest."
|
||||
)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user