Add 5 pi extensions: pi-subagents, pi-crew, rpiv-pi, pi-interactive-shell, pi-intercom

2026-05-08 15:59:25 +10:00
parent d0d1d9b045
commit 31b4110c87
457 changed files with 85157 additions and 0 deletions
--- a/extensions/pi-interactive-shell/examples/prompts/codex-implement-plan.md
+++ b/extensions/pi-interactive-shell/examples/prompts/codex-implement-plan.md
@@ -0,0 +1,34 @@
+---
+description: Launch Codex CLI in overlay to fully implement an existing plan/spec document
+---
+Determine which prompting skill to load based on model:
+- Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
+- If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
+
+Also load the `codex-cli` skill. Then read the plan at `$1`.
+
+Analyze the plan to understand: how many files are created vs modified, whether there's a prescribed implementation order or prerequisites, what existing code is referenced, and roughly how large the implementation is.
+
+Based on the prompting skill's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
+
+1. Read and internalize the full plan document. Identify every file to be created, every file to be modified, and any prerequisites or ordering constraints.
+2. Before writing any code, read all existing files that will be modified — in full, not just the sections mentioned in the plan. Also read key files they import from or that import them, to absorb the surrounding patterns, naming conventions, and architecture.
+3. If the plan specifies an implementation order or prerequisites (e.g., "extract module X before building Y"), follow that order exactly. Otherwise, implement bottom-up: shared utilities and types first, then the modules that depend on them, then integration/registration code last.
+4. Implement each piece completely. No stubs, no TODOs, no placeholder comments, no "implement this later" shortcuts. Every function body, every edge case handler, every error path described in the plan must be real code.
+5. Match existing code patterns exactly — same formatting, same import style, same error handling conventions, same naming. Read the surrounding codebase to absorb these patterns before writing. If the plan references patterns from specific files (e.g., "same pattern as X"), read those files and replicate the pattern faithfully.
+6. Stay within scope. Do not refactor, rename, or restructure adjacent code that the plan does not mention. No "while I'm here" improvements. If something adjacent looks wrong, note it in the summary but do not touch it.
+7. Keep files reasonably sized. If a file grows beyond ~500 lines, split it as the plan describes or refactor into logical sub-modules.
+8. After implementing all files, do a self-review pass: re-read the plan from top to bottom and verify every requirement, every edge case, every design decision is addressed in the code. Check for: missing imports, type mismatches, unreachable code paths, inconsistent field names between modules, and any plan requirement that was overlooked.
+9. Do NOT commit or push. Write a summary listing every file created or modified, what was implemented in each, and any plan ambiguities that required judgment calls.
+
+The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions about things answerable by reading the plan or codebase — read first, then act. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design. Emphasize scope discipline and verification requirements per the prompting skill.
+
+Determine the model flag:
+- Default: `-m gpt-5.4`
+- If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
+
+Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
+
+Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
+
+$@
--- a/extensions/pi-interactive-shell/examples/prompts/codex-review-impl.md
+++ b/extensions/pi-interactive-shell/examples/prompts/codex-review-impl.md
@@ -0,0 +1,35 @@
+---
+description: Launch Codex CLI in overlay to review implemented code changes (optionally against a plan)
+---
+Determine which prompting skill to load based on model:
+- Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
+- If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
+
+Also load the `codex-cli` skill. Then determine the review scope:
+
+- If `$1` looks like a file path (contains `/` or ends in `.md`): read it as the plan/spec these changes were based on. The diff scope is uncommitted changes vs HEAD, or if clean, the current branch vs main.
+- Otherwise: no plan file. Diff scope is the same. Treat all of `$@` as additional review context or focus areas.
+
+Run the appropriate git diff to identify which files changed and how many lines are involved. This context helps you generate a better-calibrated meta prompt.
+
+Based on the prompting skill's best practices, the diff scope, and the optional plan, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
+
+1. Identify all changed files via git diff, then read every changed file in full — not just the diff hunks. For each changed file, also read the files it imports from and key files that depend on it, to understand integration points and downstream effects.
+2. If a plan/spec was provided, read it and verify the implementation is complete — every requirement addressed, no steps skipped, nothing invented beyond scope, no partial stubs left behind.
+3. Review each changed file for: bugs, logic errors, race conditions, resource leaks (timers, event listeners, file handles, unclosed connections), null/undefined hazards, off-by-one errors, error handling gaps, type mismatches, dead code, unused imports/variables/parameters, unnecessary complexity, and inconsistency with surrounding code patterns and naming conventions.
+4. Trace key code paths end-to-end across function and file boundaries — verify data flows, state transitions, error propagation, and cleanup ordering. Don't evaluate functions in isolation.
+5. Check for missing or inadequate tests, stale documentation, and missing changelog entries.
+6. Fix every issue found with direct code edits. Keep fixes scoped to the actual issues identified — do not expand into refactoring or restructuring code that wasn't flagged in the review. If adjacent code looks problematic, note it in the summary but don't touch it.
+7. After all fixes, write a clear summary listing what was found, what was fixed, and any remaining concerns that require human judgment.
+
+The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions — if intent is unclear, read the surrounding code for context instead of asking. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp. Emphasize scope discipline and verification requirements per the prompting skill.
+
+Determine the model flag:
+- Default: `-m gpt-5.4`
+- If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
+
+Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
+
+Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
+
+$@
--- a/extensions/pi-interactive-shell/examples/prompts/codex-review-plan.md
+++ b/extensions/pi-interactive-shell/examples/prompts/codex-review-plan.md
@@ -0,0 +1,29 @@
+---
+description: Launch Codex CLI in overlay to review an implementation plan against the codebase
+---
+Determine which prompting skill to load based on model:
+- Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
+- If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
+
+Also load the `codex-cli` skill. Then read the plan at `$1`.
+
+Based on the prompting skill's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
+
+1. Read and internalize the full plan. Then read every codebase file the plan references — in full, not just the sections mentioned. Also read key files adjacent to those (imports, dependents) to understand the real state of the code the plan targets.
+2. Systematically review the plan against what the code actually looks like, not what the plan assumes it looks like.
+3. Verify every assumption, file path, API shape, data flow, and integration point mentioned in the plan against the actual codebase.
+4. Check that the plan's approach is logically sound, complete, and accounts for edge cases.
+5. Identify any gaps, contradictions, incorrect assumptions, or missing steps.
+6. Make targeted edits to the plan file to fix issues found, adding inline notes where changes were made. Fix what's wrong — do not restructure or rewrite sections that are correct.
+
+The meta prompt should follow the prompting skill's patterns (clear system context, explicit constraints, step-by-step instructions, expected output format). Instruct Codex not to ask clarifying questions — read the codebase to resolve ambiguities instead of asking. Keep progress updates brief and concrete. Emphasize scope discipline and verification requirements per the prompting skill.
+
+Determine the model flag:
+- Default: `-m gpt-5.4`
+- If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
+
+Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
+
+Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
+
+$@
--- a/extensions/pi-interactive-shell/examples/skills/codex-5-3-prompting/SKILL.md
+++ b/extensions/pi-interactive-shell/examples/skills/codex-5-3-prompting/SKILL.md
@@ -0,0 +1,161 @@
+---
+name: codex-5-3-prompting
+description: How to write system prompts and instructions for GPT-5.3-Codex. Use when constructing or tuning prompts targeting Codex 5.3.
+---
+
+# GPT-5.3-Codex Prompting Guide
+
+GPT-5.3-Codex is fast, capable, and eager. It moves quickly and will skip reading, over-refactor, and drift scope if prompts aren't tight. Explicit constraints matter more than with GPT-5.2-Codex. Include the following blocks as needed when constructing system prompts.
+
+## Output shape
+
+Always include. Controls verbosity and response structure.
+
+```
+<output_verbosity_spec>
+- Default: 3-6 sentences or <=5 bullets for typical answers.
+- Simple yes/no questions: <=2 sentences.
+- Complex multi-step or multi-file tasks:
+  - 1 short overview paragraph
+  - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
+- Avoid long narrative paragraphs; prefer compact bullets and short sections.
+- Do not rephrase the user's request unless it changes semantics.
+</output_verbosity_spec>
+```
+
+## Scope constraints
+
+Always include. GPT-5.3-Codex will add features, refactor adjacent code, and invent UI elements if you don't fence it in.
+
+```
+<design_and_scope_constraints>
+- Explore any existing design systems and understand them deeply.
+- Implement EXACTLY and ONLY what the user requests.
+- No extra features, no added components, no UX embellishments.
+- Style aligned to the design system at hand.
+- Do NOT invent colors, shadows, tokens, animations, or new UI elements unless requested or necessary.
+- If any instruction is ambiguous, choose the simplest valid interpretation.
+</design_and_scope_constraints>
+```
+
+## Context loading
+
+Always include. GPT-5.3-Codex skips reading and starts writing if you don't force it.
+
+```
+<context_loading>
+- Read ALL files that will be modified -- in full, not just the sections mentioned in the task.
+- Also read key files they import from or that depend on them.
+- Absorb surrounding patterns, naming conventions, error handling style, and architecture before writing any code.
+- Do not ask clarifying questions about things that are answerable by reading the codebase.
+</context_loading>
+```
+
+## Plan-first mode
+
+Include for multi-file work, large refactors, or any task with ordering dependencies.
+
+```
+<plan_first>
+- Before writing any code, produce a brief implementation plan:
+  - Files to create vs. modify
+  - Implementation order and prerequisites
+  - Key design decisions and edge cases
+  - Acceptance criteria for "done"
+- Get the plan right first. Then implement step by step following the plan.
+- If the plan is provided externally, follow it faithfully -- the job is execution, not second-guessing the design.
+</plan_first>
+```
+
+## Long-context handling
+
+Include when inputs exceed ~10k tokens (multi-chapter docs, long threads, multiple PDFs).
+
+```
+<long_context_handling>
+- For inputs longer than ~10k tokens:
+  - First, produce a short internal outline of the key sections relevant to the task.
+  - Re-state the constraints explicitly before answering.
+  - Anchor claims to sections ("In the 'Data Retention' section...") rather than speaking generically.
+- If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them.
+</long_context_handling>
+```
+
+## Uncertainty and ambiguity
+
+Include when the task involves underspecified requirements or hallucination-prone domains.
+
+```
+<uncertainty_and_ambiguity>
+- If the question is ambiguous or underspecified:
+  - Ask up to 1-3 precise clarifying questions, OR
+  - Present 2-3 plausible interpretations with clearly labeled assumptions.
+- Never fabricate exact figures, line numbers, or external references when uncertain.
+- When unsure, prefer "Based on the provided context..." over absolute claims.
+</uncertainty_and_ambiguity>
+```
+
+## User updates
+
+Include for agentic / long-running tasks.
+
+```
+<user_updates_spec>
+- Send brief updates (1-2 sentences) only when:
+  - You start a new major phase of work, or
+  - You discover something that changes the plan.
+- Avoid narrating routine tool calls ("reading file...", "running tests...").
+- Each update must include at least one concrete outcome ("Found X", "Confirmed Y", "Updated Z").
+- Do not expand the task beyond what was asked; if you notice new work, call it out as optional.
+</user_updates_spec>
+```
+
+## Tool usage
+
+Include when the prompt involves tool-calling agents.
+
+```
+<tool_usage_rules>
+- Prefer tools over internal knowledge whenever:
+  - You need fresh or user-specific data (tickets, orders, configs, logs).
+  - You reference specific IDs, URLs, or document titles.
+- Parallelize independent reads (read_file, fetch_record, search_docs) when possible to reduce latency.
+- After any write/update tool call, briefly restate:
+  - What changed
+  - Where (ID or path)
+  - Any follow-up validation performed
+</tool_usage_rules>
+```
+
+## Reasoning effort
+
+Set `model_reasoning_effort` via Codex CLI: `-c model_reasoning_effort="high"`
+
+| Task type | Effort |
+|---|---|
+| Simple code generation, formatting | `low` or `medium` |
+| Standard implementation from clear specs | `high` |
+| Complex refactors, plan review, architecture | `xhigh` |
+| Code review (thorough) | `high` or `xhigh` |
+
+## Backwards compatibility hedging
+
+GPT-5.3-Codex has a strong tendency to preserve old patterns, add compatibility shims, and provide fallback code "just in case" -- even when explicitly told not to worry about backwards compatibility. Vague instructions like "don't worry about backwards compatibility" get interpreted weakly; the model may still hedge.
+
+Use **"cutover"** to signal a clean, irreversible break. It's a precise industry term that conveys finality and intentional deprecation -- no dual-support phase, no gradual migration, no preserving old behavior.
+
+Instead of:
+> "Rewrite this and don't worry about backwards compatibility"
+
+Say:
+> "This is a cutover. No backwards compatibility. Rewrite using only Python 3.12+ features and current best practices. Do not preserve legacy code, polyfills, or deprecated patterns."
+
+## Quick reference
+
+- **Force reading first.** "Read all necessary files before you ask any dumb question."
+- **Use plan mode.** Draft the full task with acceptance criteria before implementing.
+- **Steer aggressively mid-task.** GPT-5.3-Codex handles redirects without losing context. Be direct: "Stop. Fix the actual cause." / "Simplest valid implementation only."
+- **Constrain scope hard.** GPT-5.3-Codex will refactor aggressively if you don't fence it in.
+- **Watch context burn.** Faster model = faster context consumption. Start fresh at ~40%.
+- **Use domain jargon.** "Cutover," "golden-path," "no fallbacks," "domain split" get cleaner, faster responses.
+- **Download libraries locally.** Tell it to read them for better context than relying on training data.
--- a/extensions/pi-interactive-shell/examples/skills/codex-cli/SKILL.md
+++ b/extensions/pi-interactive-shell/examples/skills/codex-cli/SKILL.md
@@ -0,0 +1,130 @@
+---
+name: codex-cli
+description: OpenAI Codex CLI reference. Use when running codex in interactive_shell overlay or when user asks about codex CLI options.
+---
+
+# Codex CLI (OpenAI)
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `codex` | Start interactive TUI |
+| `codex "prompt"` | TUI with initial prompt |
+| `codex exec "prompt"` | Non-interactive (headless), streams to stdout. Supports `--output-schema <file>` for structured JSON output |
+| `codex e "prompt"` | Shorthand for exec |
+| `codex login` | Authenticate (OAuth, device auth, or API key) |
+| `codex login status` | Show auth mode |
+| `codex logout` | Remove credentials |
+| `codex mcp` | Manage MCP servers |
+| `codex completion` | Generate shell completions |
+
+## Key Flags
+
+| Flag | Description |
+|------|-------------|
+| `-m, --model <model>` | Switch model (prefer `gpt-5.5`) |
+| `-c <key=value>` | Override config.toml values (dotted paths, parsed as TOML) |
+| `-p, --profile <name>` | Use config profile from config.toml |
+| `-s, --sandbox <mode>` | Sandbox policy: `read-only`, `workspace-write`, `danger-full-access` |
+| `-a, --ask-for-approval <policy>` | `untrusted`, `on-failure`, `on-request`, `never` |
+| `--full-auto` | Alias for `-a on-request --sandbox workspace-write` |
+| `--search` | Enable live web search tool |
+| `-i, --image <file>` | Attach image(s) to initial prompt |
+| `--add-dir <dir>` | Additional writable directories |
+| `-C, --cd <dir>` | Set working root directory |
+| `--no-alt-screen` | Inline mode (preserve terminal scrollback) |
+
+## Sandbox Modes
+
+- `read-only` - Can only read files
+- `workspace-write` - Can write to workspace
+- `danger-full-access` - Full system access (use with caution)
+
+## Features
+
+- **Image inputs** - Accepts screenshots and design specs
+- **Image generation (gpt-image-2)** - Generate images via natural language or explicit invocation
+- **Code review** - Reviews changes before commit
+- **Web search** - Can search for information
+- **MCP integration** - Third-party tool support
+
+## Image Generation (gpt-image-2)
+
+Codex CLI can generate images using OpenAI's **gpt-image-2** - the latest cutting-edge image model with superior realism, prompt adherence, and accurate text rendering in images. It can produce full high-fidelity design mockups for web pages and apps with unprecedented accuracy and control.
+
+### How to Invoke
+
+#### Natural Language (Recommended)
+
+Just describe what you want naturally:
+
+```bash
+codex "Generate a clean app icon for a fitness tracker, flat design, 512x512"
+codex "Create a hero banner for a SaaS landing page showing a dashboard with dark mode"
+codex -i screenshot.png "Edit this screenshot to make the button green and add a tooltip"
+```
+
+#### Explicit Skill Invocation
+
+Include `$imagegen` anywhere in your prompt to force the image-generation tool. This is a Codex keyword, not a shell variable, so shell examples use single quotes to keep it literal.
+
+```bash
+codex 'Make a pixel-art sprite sheet for a platformer game $imagegen'
+codex 'Generate a logo for my coffee shop $imagegen'
+```
+
+Codex will generate the image(s), display them inline in the terminal (or save them locally). You can iterate on them, attach them to future prompts, or use them in your codebase.
+
+### Tips
+
+- **Image editing / iteration**: Attach a reference image (screenshot, wireframe, mockup) to your prompt. Codex handles multimodal input natively.
+  ```bash
+  codex -i wireframe.png "Turn this wireframe into a polished UI mockup"
+  codex -i design.png "Generate code for this design"
+  ```
+
+- **Usage & limits**: Images count against your regular Codex usage quota and consume it 3-5x faster than text-only turns (depending on size/quality).
+
+- **Heavy/batch work**: For production pipelines, set `OPENAI_API_KEY` in your shell and tell Codex to call the OpenAI Images API directly. It will then use `gpt-image-2` with full API pricing and options.
+
+- **No config needed**: Image generation is enabled by default. Older experimental flags like `codex features enable image_generation` are no longer required.
+
+## Config
+
+Config file: `~/.codex/config.toml`
+
+Key config values (set in file or override with `-c`):
+- `model` -- model name (prefer `gpt-5.5`)
+- `model_reasoning_effort` -- `low`, `medium`, `high`, `xhigh`
+- `model_reasoning_summary` -- `detailed`, `concise`, `none`
+- `model_verbosity` -- `low`, `medium`, `high`
+- `profile` -- default profile name
+- `tool_output_token_limit` -- max tokens per tool output
+
+Define profiles for different projects/modes with `[profiles.<name>]` sections. Override at runtime with `-p <name>` or `-c model_reasoning_effort="high"`.
+
+## In interactive_shell
+
+Do NOT pass `-s` / `--sandbox` flags. Codex's `read-only` and `workspace-write` sandbox modes apply OS-level filesystem restrictions that break basic shell operations inside the PTY -- zsh can't even create temp files for here-documents, so every write attempt fails with "operation not permitted." The interactive shell overlay already provides supervision (user watches in real-time, Ctrl+Q to kill, Ctrl+T to transfer output), making Codex's sandbox redundant.
+
+Prefer `gpt-5.5` for Codex CLI work. For users with a default profile configured to `gpt-5.5`, just run `codex "prompt"` to use those defaults -- no model or profile flags needed.
+
+For delegated fire-and-forget runs, prefer `mode: "dispatch"` so the agent is notified automatically when Codex completes.
+
+```typescript
+// Delegated run with completion notification (recommended default)
+interactive_shell({
+  command: 'codex "Review this codebase for security issues"',
+  mode: "dispatch"
+})
+
+// Override reasoning effort for a single delegated run
+interactive_shell({
+  command: 'codex -c model_reasoning_effort="xhigh" "Complex refactor task"',
+  mode: "dispatch"
+})
+
+// Headless - use bash instead
+bash({ command: 'codex exec "summarize the repo"' })
+```
--- a/extensions/pi-interactive-shell/examples/skills/cursor-cli/SKILL.md
+++ b/extensions/pi-interactive-shell/examples/skills/cursor-cli/SKILL.md
@@ -0,0 +1,53 @@
+---
+name: cursor-cli
+description: Cursor CLI reference. Use when running Cursor in interactive_shell overlay or when user asks about Cursor CLI options.
+---
+
+# Cursor CLI
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `agent` | Start interactive Cursor session |
+| `agent "prompt"` | Interactive session with initial prompt |
+| `agent -p "prompt"` | Non-interactive print mode |
+| `agent ls` | List previous chats |
+| `agent resume` | Resume latest chat |
+| `agent --continue` | Continue previous session |
+| `agent --resume "chat-id"` | Resume a specific chat |
+
+## Key Flags
+
+| Flag | Description |
+|------|-------------|
+| `--mode plan` / `--plan` | Plan mode (clarify before coding) |
+| `--mode ask` | Ask mode (read-only exploration) |
+| `--model <model>` | Model override |
+| `--sandbox <enabled|disabled>` | Toggle sandbox behavior |
+| `--output-format text` | Output format for print mode workflows |
+
+## Mode Notes
+
+- **Interactive mode** (`agent`, `agent "prompt"`) is the right fit for `interactive_shell` overlays.
+- **Print mode** (`agent -p`) is non-interactive and better suited to direct shell/batch usage.
+
+## In interactive_shell
+
+Use structured spawn when you want the extension's shared spawn resolver/defaults/worktree support:
+
+```typescript
+interactive_shell({ spawn: { agent: "cursor" }, mode: "interactive" })
+interactive_shell({ spawn: { agent: "cursor", prompt: "Review the diffs" }, mode: "dispatch" })
+interactive_shell({ spawn: { agent: "cursor", worktree: true }, mode: "hands-free" })
+```
+
+Structured spawn launches Cursor via the configured `spawn.commands.cursor` executable (default: `agent`) and appends prompt text as Cursor's native interactive startup form (`agent "prompt"`). By default, spawn args include `--model composer-2-fast`, which selects Cursor's Composer 2 Fast model explicitly.
+
+Cursor remains **fresh/worktree only** in structured spawn. `fork` is Pi-only.
+
+For non-interactive print-mode tasks, prefer direct shell usage:
+
+```typescript
+bash({ command: 'agent -p "review these changes for security issues" --output-format text' })
+```
--- a/extensions/pi-interactive-shell/examples/skills/gpt-5-4-prompting/SKILL.md
+++ b/extensions/pi-interactive-shell/examples/skills/gpt-5-4-prompting/SKILL.md
@@ -0,0 +1,202 @@
+---
+name: gpt-5-4-prompting
+description: How to write system prompts and instructions for GPT-5.4. Use when constructing or tuning prompts targeting GPT-5.4.
+---
+
+# GPT-5.4 Prompting Guide
+
+GPT-5.4 unifies reasoning, coding, and agentic capabilities into a single frontier model. It's extremely persistent, highly token-efficient, and delivers more human-like outputs than its predecessors. However, it has new failure modes: it moves fast without solid plans, expands scope aggressively, and can prematurely declare tasks complete—sometimes falsely claiming success. Prompts must account for these behaviors.
+
+## Output shape
+
+Always include.
+
+```
+<output_verbosity_spec>
+- Default: 3-6 sentences or <=5 bullets for typical answers.
+- Simple yes/no questions: <=2 sentences.
+- Complex multi-step or multi-file tasks:
+  - 1 short overview paragraph
+  - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
+- Avoid long narrative paragraphs; prefer compact bullets and short sections.
+- Do not rephrase the user's request unless it changes semantics.
+</output_verbosity_spec>
+```
+
+## Scope constraints
+
+Critical. GPT-5.4's primary failure mode is scope expansion—it adds features, refactors beyond the ask, and "helpfully" extends tasks. Fence it in hard.
+
+```
+<design_and_scope_constraints>
+- Implement EXACTLY and ONLY what the user requests. Nothing more.
+- No extra features, no "while I'm here" improvements, no UX embellishments.
+- Do NOT expand the task scope under any circumstances.
+- If you notice adjacent issues or opportunities, note them in your summary but DO NOT act on them.
+- If any instruction is ambiguous, choose the simplest valid interpretation.
+- Style aligned to the existing design system. Do not invent new patterns.
+- Do NOT invent colors, shadows, tokens, animations, or new UI elements unless explicitly requested.
+</design_and_scope_constraints>
+```
+
+## Verification requirements
+
+Critical. GPT-5.4 can declare tasks complete prematurely or claim success when the implementation is incorrect. Force explicit verification.
+
+```
+<verification_requirements>
+- Before declaring any task complete, perform explicit verification:
+  - Re-read the original requirements
+  - Check that every requirement is addressed in the actual code
+  - Run tests or validation steps if available
+  - Confirm the implementation actually works, don't assume
+- Do NOT claim success based on intent—verify actual outcomes.
+- If you cannot verify (no tests, can't run code), say so explicitly.
+- When reporting completion, include concrete evidence: test results, verified file contents, or explicit acknowledgment of what couldn't be verified.
+- If something failed or was skipped, say so clearly. Do not obscure failures.
+</verification_requirements>
+```
+
+## Context loading
+
+Always include. GPT-5.4 is faster and may skip reading in favor of acting. Force thoroughness.
+
+```
+<context_loading>
+- Read ALL files that will be modified—in full, not just the sections mentioned in the task.
+- Also read key files they import from or that depend on them.
+- Absorb surrounding patterns, naming conventions, error handling style, and architecture before writing any code.
+- Do not ask clarifying questions about things that are answerable by reading the codebase.
+- If modifying existing code, understand the full context before making changes.
+</context_loading>
+```
+
+## Plan-first mode
+
+Include for multi-file work, refactors, or tasks with ordering dependencies. GPT-5.4 produces good natural-language plans but may skip validation steps.
+
+```
+<plan_first>
+- Before writing any code, produce a brief implementation plan:
+  - Files to create vs. modify
+  - Implementation order and prerequisites
+  - Key design decisions and edge cases
+  - Acceptance criteria for "done"
+  - How you will verify each step
+- Execute the plan step by step. After each step, verify it worked before proceeding.
+- If the plan is provided externally, follow it faithfully—the job is execution, not second-guessing.
+- Do NOT skip verification steps even if you're confident.
+</plan_first>
+```
+
+## Long-context handling
+
+GPT-5.4 supports up to 1M tokens, but accuracy degrades beyond ~512K. Handle long inputs carefully.
+
+```
+<long_context_handling>
+- For inputs longer than ~10k tokens:
+  - First, produce a short internal outline of the key sections relevant to the task.
+  - Re-state the constraints explicitly before answering.
+  - Anchor claims to sections ("In the 'Data Retention' section...") rather than speaking generically.
+- If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them.
+- For very long contexts (200K+ tokens):
+  - Be extra vigilant about accuracy—retrieval quality degrades.
+  - Cross-reference claims against multiple sections.
+  - Prefer citing specific locations over making sweeping statements.
+</long_context_handling>
+```
+
+## Tool usage
+
+```
+<tool_usage_rules>
+- Prefer tools over internal knowledge whenever:
+  - You need fresh or user-specific data (tickets, orders, configs, logs).
+  - You reference specific IDs, URLs, or document titles.
+- Parallelize independent tool calls when possible to reduce latency.
+- After any write/update tool call, verify the outcome—do not assume success.
+- After any write/update tool call, briefly restate:
+  - What changed
+  - Where (ID or path)
+  - Verification performed or why verification was skipped
+</tool_usage_rules>
+```
+
+## Backwards compatibility hedging
+
+GPT-5.4 tends to preserve old patterns and add compatibility shims. Use **"cutover"** to signal a clean break.
+
+Instead of:
+> "Rewrite this and don't worry about backwards compatibility"
+
+Say:
+> "This is a cutover. No backwards compatibility. Rewrite using only Python 3.12+ features and current best practices. Do not preserve legacy code, polyfills, or deprecated patterns."
+
+## Quick reference
+
+- **Constrain scope aggressively.** GPT-5.4 expands tasks beyond the ask. "ONLY what is requested, nothing more."
+- **Force verification.** Don't trust "done"—require evidence. "Verify before claiming complete."
+- **Use cutover language.** "Cutover," "no fallbacks," "exactly as specified" get cleaner results.
+- **Plan mode helps.** Explicit plan-first prompts ensure verification steps.
+- **Watch for false success claims.** In agent harnesses, add explicit validation steps. Don't let it self-report completion.
+- **Steer mid-task.** GPT-5.4 handles redirects well. Be direct: "Stop. That's out of scope." / "Verify that actually worked."
+- **Use domain jargon.** "Cutover," "golden-path," "no fallbacks," "domain split," "exactly as specified" trigger precise behavior.
+- **Long context degrades.** Above ~512K tokens, cross-reference claims and cite specific sections.
+- **Token efficiency is real.** 5.4 uses fewer tokens per problem—but verify it didn't skip steps to get there.
+
+## Example: implementation task prompt
+
+```
+<system>
+You are implementing a feature in an existing codebase. Follow these rules strictly.
+
+<design_and_scope_constraints>
+- Implement EXACTLY and ONLY what the user requests. Nothing more.
+- No extra features, no "while I'm here" improvements.
+- If you notice adjacent issues, note them in your summary but DO NOT act on them.
+</design_and_scope_constraints>
+
+<context_loading>
+- Read ALL files that will be modified—in full.
+- Also read key files they import from or depend on.
+- Absorb patterns before writing any code.
+</context_loading>
+
+<verification_requirements>
+- Before declaring complete, verify each requirement is addressed in actual code.
+- Run tests if available. If not, state what couldn't be verified.
+- Include concrete evidence of completion in your summary.
+</verification_requirements>
+
+<output_verbosity_spec>
+- Brief updates only on major phases or blockers.
+- Final summary: What changed, Where, Risks, Next steps.
+</output_verbosity_spec>
+</system>
+```
+
+## Example: code review prompt
+
+```
+<system>
+You are reviewing code changes. Be thorough but stay in scope.
+
+<context_loading>
+- Read every changed file in full, not just the diff hunks.
+- Also read files they import from and key dependents.
+</context_loading>
+
+<review_scope>
+- Review for: bugs, logic errors, race conditions, resource leaks, null hazards, error handling gaps, type mismatches, dead code, unused imports, pattern inconsistencies.
+- Fix issues you find with direct code edits.
+- Do NOT refactor or restructure code that wasn't flagged in the review.
+- If adjacent code looks problematic, note it but don't touch it.
+</review_scope>
+
+<verification_requirements>
+- After fixes, verify the code still works. Run tests if available.
+- In your summary, list what was found, what was fixed, and what couldn't be verified.
+</verification_requirements>
+</system>
+```