PDFsmith

Why pdfsmith?

One API for 19+ backends — switch parsers without changing code
Benchmark-informed defaults — auto-selects based on pdf-bench findings
Frontier LLM support — Claude, GPT-4o, Gemini for challenging documents
Modular installation — pip install only what you need
Production ready — consistent error handling, unified output format

from pdfsmith import parse

# Auto-select best available backend
markdown = parse("document.pdf")

# Use specific backend
markdown = parse("document.pdf", backend="docling")

# Use frontier LLM for complex documents
markdown = parse("document.pdf", backend="anthropic")

One API, any backend

pip install pdfsmith                 # Core only
pip install pdfsmith[light]          # pypdf, pdfplumber, pymupdf
pip install pdfsmith[recommended]    # Balanced stack
pip install pdfsmith[frontier]       # Claude, GPT-4o, Gemini
pip install pdfsmith[all]            # Everything

Install what you need