Our Projects
Production-tested tools and libraries that solve real problems. All open source, all battle-hardened.
Enterprise Agents
Reference implementations for production AI agents
We tested AI agents in four real-world automation scenarios, each requiring different levels of sophistication:
- Customer message routing — classifying support tickets into departments (solvable with keyword matching)
- Multi-turn customer support — handling complete conversations with tool access (requires LLM + tools)
- Contract clause extraction — identifying risk provisions in legal documents (LLMs excel here)
- Server issue remediation — diagnosing and fixing infrastructure problems (dangerous without guardrails)
Each experiment includes evaluation harnesses so you can reproduce results on your own data.
PDFbench
PDF parser benchmark suite with 610+ real documents
The most comprehensive PDF parser benchmark available. We measured:
- 19 parsers — pymupdf, pdfplumber, marker, docling, Claude, GPT-4o, and more
- 610 documents — contracts, invoices, scientific papers, financial reports, forms, scans
- 6 metrics — text accuracy, structure recovery, table extraction (TEDS), speed
Raw results, evaluation scripts, and full corpus available for independent verification.
PDFsmith
Unified PDF parser interface with 11+ backend options
One API for 19+ PDF parsing backends. Switch parsers without changing code:
- Smart routing — send scanned docs to OCR backends, tables to pdfplumber, complex layouts to Claude
- Modular install — core has zero heavy dependencies; add backends as needed
- Benchmark-informed — defaults based on PDFbench findings
Claude Code Toolkit
Production-grade plugins and workflows for Claude Code
A plugin system for Claude Code built from 6+ months of daily use. Includes:
- 28 slash commands — structured workflows for explore → plan → implement → review
- 5 specialized agents — architect, test-engineer, code-reviewer, data-scientist, report-generator
- 6 domain skills — writing frameworks, D2 diagrams, ML experimentation
For practitioners who want repeatable agentic workflows, not ad-hoc prompting.