Why pdfsmith?
- One API for 19+ backends — switch parsers without changing code
- Benchmark-informed defaults — auto-selects based on pdf-bench findings
- Frontier LLM support — Claude, GPT-4o, Gemini for challenging documents
- Modular installation — pip install only what you need
- Production ready — consistent error handling, unified output format
from pdfsmith import parse
# Auto-select best available backend
markdown = parse("document.pdf")
# Use specific backend
markdown = parse("document.pdf", backend="docling")
# Use frontier LLM for complex documents
markdown = parse("document.pdf", backend="anthropic")
pip install pdfsmith # Core only
pip install pdfsmith[light] # pypdf, pdfplumber, pymupdf
pip install pdfsmith[recommended] # Balanced stack
pip install pdfsmith[frontier] # Claude, GPT-4o, Gemini
pip install pdfsmith[all] # Everything