Files
pi-config/skills/markitdown/SKILL.md

3.8 KiB

name, description
name description
markitdown Convert various file formats to Markdown for use with LLMs and text analysis. Supports PDF, Word, Excel, PowerPoint, images, HTML, CSV, JSON, XML, ZIP, EPubs, and YouTube URLs.

MarkItDown

Convert files to Markdown for LLM consumption and text analysis. A lightweight Python utility by Microsoft.

Installation

Installed in a Python venv at /tmp/markitdown-env/ with a wrapper at ~/.local/bin/markitdown.

The wrapper handles LD_LIBRARY_PATH for numpy's C extensions on NixOS.

If the venv is missing (e.g., after rebuild), recreate:

nix-shell -p python3 python3.pkgs.pip python3.pkgs.virtualenv gcc stdenv.cc.cc.lib --run "
  python3 -m venv /tmp/markitdown-env
  source /tmp/markitdown-env/bin/activate
  pip install 'markitdown[pdf,docx,pptx,xlsx]'
"

Then recreate the wrapper at ~/.local/bin/markitdown.

Supported Formats

Format Extension Dependencies
PDF .pdf pdfminer-six, pdfplumber
Word .docx lxml, mammoth
PowerPoint .pptx python-pptx
Excel .xlsx, .xls openpyxl, pandas, xlrd
Images .jpg, .png, etc. EXIF metadata (core); LLM vision via llm_client/llm_model; OCR via markitdown-ocr plugin (installed)
HTML .html, .htm beautifulsoup4 (core)
CSV .csv (core)
JSON .json (core)
XML .xml (core)
ZIP .zip (core, iterates contents)
EPubs .epub (core)
YouTube URLs youtube-transcript-api (core)
Text .txt, .md, etc. (core)

CLI Usage

# Convert a file to Markdown (stdout)
markitdown path/to/file.pdf

# Write to file
markitdown path/to/file.pdf -o output.md

# Pipe content
cat file.pdf | markitdown

Python API

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)

Integration with Pi

Use markitdown to convert files before reading them with the read tool:

# Convert then read
markitdown report.pdf -o /tmp/report.md && read /tmp/report.md

This is especially useful for:

  • PDFs that need structure preserved (headings, lists, tables)
  • Office documents (Word, Excel, PowerPoint)
  • Images with EXIF metadata
  • Any file format not directly readable by the read tool

Image Analysis (LLM Vision)

For images, markitdown can extract EXIF metadata (free, no API key) AND describe image content using an LLM vision model.

EXIF only (already works):

markitdown photo.jpg

With LLM vision — requires OpenRouter API key:

Set environment variable:

export OPENROUTER_API_KEY=sk-or-v1-...

Then use the wrapper:

markitdown-vision photo.jpg

Or use Python API directly:

from markitdown import MarkItDown
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

md = MarkItDown(
    llm_client=client,
    llm_model="qwen/qwen2.5-vl-72b-instruct",
)

result = md.convert("photo.jpg")
print(result.text_content)

Why Qwen 2.5 VL 72B?

  • Excellent vision understanding
  • Affordable: ~$0.25/M input, ~$0.75/M output tokens
  • 131K context window
  • Available on OpenRouter

OCR inside documents (installed): markitdown-ocr plugin is installed. Enable with:

md = MarkItDown(enable_plugins=True, llm_client=client, llm_model="qwen/qwen2.5-vl-72b-instruct")

This extracts text from images embedded in PDFs, Word, PowerPoint, and Excel files.

Security Notes

  • MarkItDown performs I/O with current process privileges
  • Sanitize inputs in untrusted environments
  • Only convert files from trusted sources
  • The [pdf,docx,pptx,xlsx] extras are installed; audio transcription and Azure AI are NOT installed
  • Image analysis requires an OpenRouter API key — costs tokens per image