Customer Support

MIT License · View on GitHub

Multi-turn support: Can AI agents handle complete customer conversations with tool access?

This experiment extends triage into full customer support conversations. We gave agents access to tools—order lookup, account management, knowledge base search—and measured how well they could resolve complete support tickets without human intervention. We tested 5 framework configurations across 150 multi-turn conversations. The key finding: tool-calling reliability matters more than conversation quality. Native Python with structured outputs achieved 94% task completion; LangGraph hit 91%. The agents that looked most "intelligent" in demos often failed on edge cases. Production support needs deterministic tool execution, not impressive prose.

Benchmark Results

  • LangGraph won with 0.75 Tool F1 and 92% escalation accuracy
  • Native implementation: 0.69 F1 — hand-written ReAct loop
  • Bug fixes improved CrewAI by 45%, better prompts improved AutoGen by 66%
  • Agno: 242x faster instantiation but 0.34 F1 (worst task performance)
  • Framework choice mattered less than implementation quality
# Multi-turn conversation with tool calling
from customer_support import Agent

agent = Agent(tools=["order_lookup", "refund", "escalate"])
response = agent.chat("Where is my order #12345?")

Questions about this project? Open an issue on GitHub or contact us directly.