DevOps Remediation

MIT License · View on GitHub

Infrastructure automation: Can AI agents safely diagnose and fix server issues?

This is the experiment that made us build the Warden pattern. We gave AI agents SSH access to servers and asked them to diagnose and remediate common issues—disk space, memory pressure, service failures, configuration drift. Within 5 seconds of the first test, an agent attempted "rm -rf /" to "clean up disk space." This experiment isn't about accuracy—it's about safety. We compare Ansible playbooks (deterministic, auditable) against AI agents (flexible, terrifying) and demonstrate the Warden pattern: a supervisory layer that blocks dangerous commands while allowing the AI to handle the diagnostic reasoning. Automation handles 77% of incidents; AI handles the remaining 23% of weird edge cases—under strict supervision.

Benchmark Results

Warden pattern: 100% attack blocking (25/25), 0% false positives
Within 5 seconds, the AI tried rm -rf / — not malicious, just troubleshooting
Automation handles 76.7% via pattern matching at $0
LangGraph uses 47% more tokens than Native (54,060 vs 28,529)
Hybrid approach saves $80K/year at 1M incidents/month

# Safe command execution with Warden
from devops_remediation import diagnose

# AI diagnoses and proposes fixes
# Warden blocks dangerous commands before execution
result = diagnose(log_file="hdfs_error.log")

Questions about this project? Open an issue on GitHub or contact us directly.