Files

Sam Rolfe d3ce7f12de skill: add image analysis with Qwen 2.5 VL via OpenRouter

2026-06-08 11:37:31 +10:00

3.8 KiB

Raw Blame History

name, description

name	description
markitdown	Convert various file formats to Markdown for use with LLMs and text analysis. Supports PDF, Word, Excel, PowerPoint, images, HTML, CSV, JSON, XML, ZIP, EPubs, and YouTube URLs.

MarkItDown

Convert files to Markdown for LLM consumption and text analysis. A lightweight Python utility by Microsoft.

Installation

Installed in a Python venv at /tmp/markitdown-env/ with a wrapper at ~/.local/bin/markitdown.

The wrapper handles LD_LIBRARY_PATH for numpy's C extensions on NixOS.

If the venv is missing (e.g., after rebuild), recreate:

nix-shell -p python3 python3.pkgs.pip python3.pkgs.virtualenv gcc stdenv.cc.cc.lib --run "
  python3 -m venv /tmp/markitdown-env
  source /tmp/markitdown-env/bin/activate
  pip install 'markitdown[pdf,docx,pptx,xlsx]'
"

Then recreate the wrapper at ~/.local/bin/markitdown.

Supported Formats

Format	Extension	Dependencies
PDF	`.pdf`	pdfminer-six, pdfplumber
Word	`.docx`	lxml, mammoth
PowerPoint	`.pptx`	python-pptx
Excel	`.xlsx`, `.xls`	openpyxl, pandas, xlrd
Images	`.jpg`, `.png`, etc.	EXIF metadata (core); LLM vision via `llm_client`/`llm_model`; OCR via `markitdown-ocr` plugin (installed)
HTML	`.html`, `.htm`	beautifulsoup4 (core)
CSV	`.csv`	(core)
JSON	`.json`	(core)
XML	`.xml`	(core)
ZIP	`.zip`	(core, iterates contents)
EPubs	`.epub`	(core)
YouTube	URLs	youtube-transcript-api (core)
Text	`.txt`, `.md`, etc.	(core)

CLI Usage

# Convert a file to Markdown (stdout)
markitdown path/to/file.pdf

# Write to file
markitdown path/to/file.pdf -o output.md

# Pipe content
cat file.pdf | markitdown

Python API

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)

Integration with Pi

Use markitdown to convert files before reading them with the read tool:

# Convert then read
markitdown report.pdf -o /tmp/report.md && read /tmp/report.md

This is especially useful for:

PDFs that need structure preserved (headings, lists, tables)
Office documents (Word, Excel, PowerPoint)
Images with EXIF metadata
Any file format not directly readable by the read tool

Image Analysis (LLM Vision)

For images, markitdown can extract EXIF metadata (free, no API key) AND describe image content using an LLM vision model.

EXIF only (already works):

markitdown photo.jpg

With LLM vision — requires OpenRouter API key:

Set environment variable:

export OPENROUTER_API_KEY=sk-or-v1-...

Then use the wrapper:

markitdown-vision photo.jpg

Or use Python API directly:

from markitdown import MarkItDown
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

md = MarkItDown(
    llm_client=client,
    llm_model="qwen/qwen2.5-vl-72b-instruct",
)

result = md.convert("photo.jpg")
print(result.text_content)

Why Qwen 2.5 VL 72B?

Excellent vision understanding
Affordable: ~$0.25/M input, ~$0.75/M output tokens
131K context window
Available on OpenRouter

OCR inside documents (installed): markitdown-ocr plugin is installed. Enable with:

md = MarkItDown(enable_plugins=True, llm_client=client, llm_model="qwen/qwen2.5-vl-72b-instruct")

This extracts text from images embedded in PDFs, Word, PowerPoint, and Excel files.

Security Notes

MarkItDown performs I/O with current process privileges
Sanitize inputs in untrusted environments
Only convert files from trusted sources
The [pdf,docx,pptx,xlsx] extras are installed; audio transcription and Azure AI are NOT installed
Image analysis requires an OpenRouter API key — costs tokens per image

3.8 KiB Raw Blame History