Benchmark Results
- Baseline (keyword): 68.6% accuracy at $0 per request
- Best value (Native): 87.6% accuracy at $0.000074/request — 19 point lift
- Best accuracy (CrewAI): 88.4% at 2.3x cost — only 0.8 points better
- Framework spread was only 4 points (84-88%) across 6 implementations
- 250 samples evaluated, not the 50-sample pilot that showed inflated 94%
# Route customer messages to departments
from customer_triage import classify
result = classify("I need to return my order")
# → {"department": "orders", "confidence": 0.94}