diff --git a/300 areas/350 AI/Pi Agent Extensions & Skills.md b/300 areas/350 AI/Pi Agent Extensions & Skills.md index 1cb11e7..ed65bbb 100644 --- a/300 areas/350 AI/Pi Agent Extensions & Skills.md +++ b/300 areas/350 AI/Pi Agent Extensions & Skills.md @@ -36,6 +36,7 @@ aliases: [] | **pi-graphify** | `~/.agents` | Knowledge graph tools: build, query, path tracing, explain, watch, add, update | | **plannotator** | `~/.agents` | Interactive plan review with browser UI, annotations, code review | | **caveman** | `~/.agents` | Ultra-compressed communication mode | +| **markitdown** | `~/.agents` | Convert files (PDF, Word, Excel, PPTX, images, HTML, etc.) to Markdown. Image analysis via Qwen 2.5 VL 72B on OpenRouter. | --- @@ -53,6 +54,59 @@ aliases: [] | **openspec-archive-change** | Archive completed changes | | **openspec-explore** | Explore ideas and clarify requirements | | **npm-security** | Scan packages with SafeDep Vet, check typosquatting with npq, wrap installs with Socket Firewall | +| **markitdown** | Convert files (PDF, Word, Excel, PowerPoint, images, HTML, CSV, JSON, XML, ZIP, EPubs, YouTube) to Markdown for LLM consumption. Image analysis via Qwen 2.5 VL 72B on OpenRouter. | + +--- + +## markitdown + +Convert various file formats to Markdown. Useful for feeding documents and images into LLMs. + +### What it converts + +| Format | Input | Notes | +|--------|-------|-------| +| PDF | `.pdf` | Preserves structure (headings, lists, tables) | +| Word | `.docx` | mammoth + lxml | +| PowerPoint | `.pptx` | python-pptx | +| Excel | `.xlsx`, `.xls` | openpyxl + pandas | +| Images | `.jpg`, `.png`, etc. | EXIF metadata (free) + LLM vision description (via OpenRouter) | +| HTML | `.html` | beautifulsoup4 | +| CSV / JSON / XML | `.csv`, `.json`, `.xml` | Structured data → Markdown tables | +| ZIP | `.zip` | Iterates contents, converts each file | +| EPubs | `.epub` | | +| YouTube | URLs | Transcript extraction | + +### CLI usage + +```bash +# Convert file to Markdown (stdout) +markitdown document.pdf + +# Write to file +markitdown document.pdf -o document.md + +# Image with LLM vision description +markitdown-vision photo.jpg +``` + +### Image analysis + +Two levels: + +1. **EXIF metadata only** (free, no API key): `markitdown photo.jpg` +2. **LLM vision description** (via OpenRouter, requires API key): `markitdown-vision photo.jpg` + +The `markitdown-vision` wrapper auto-sources `OPENROUTER_API_KEY` from `~/.config/environment.d/10-secrets.conf` and uses `qwen/qwen2.5-vl-72b-instruct`. + +### Missing / can be added + +| Feature | What's needed | +|---------|--------------| +| Audio transcription | `pip install markitdown[audio-transcription]` (pydub + speechrecognition) | +| Azure AI Document Intelligence | `pip install markitdown[az-doc-intel]` + Azure credentials | +| Azure Content Understanding | `pip install markitdown[az-content-understanding]` + Azure credentials | +| markitdown-ocr plugin | Installed but needs OpenRouter key enabled to activate | --- @@ -242,6 +296,8 @@ pi-mcp-adapter connects Pi to external services via the Model Context Protocol. | `/filechanges` | Review changed files, diffs, accept/decline | | `/filechanges-accept` | Accept all changes | | `/filechanges-decline` | Revert all changes | +| `markitdown ` | Convert file to Markdown (PDF, Word, Excel, PPTX, images, HTML, etc.) | +| `markitdown-vision ` | Describe image using Qwen 2.5 VL 72B via OpenRouter | --- @@ -266,4 +322,5 @@ pi-mcp-adapter connects Pi to external services via the Model Context Protocol. - [ ] Run `/reload` in Pi to activate filechanges and new templates - [ ] Add more prompt templates to `~/.pi/agent/prompts/` as needed - [ ] Verify video-extract works with Gemini +- [x] Add markitdown skill to Obsidian skills page - [ ] Clean up workspace-map.json entries for any stale memory packs