Chapter contents
Appendix A: business case template for adopting AI agents
Target audience: CTOs, VPs of Engineering, engineering managers, business stakeholders
Goal: help justify adopting AI agents without false precision—through variables, risks, and verifiable success criteria
Date: 2026-01-17
Executive summary (template)#
Proposal: adopt AI agents to automate repeatable engineering tasks and accelerate feedback loops (analysis, first-pass triage, change drafts, documentation, checklist-based verification).
Why now: growing load and complexity make manual work increasingly expensive; risk is rising too (security, quality, incidents). Without process discipline, speed turns into debt.
How we reduce risk: guardrails, stop conditions, verification plans, least privilege, audit trail, eval datasets, and golden tests.
How we decide: run a pilot, measure against pre-agreed criteria, and make a go/no-go call (continue/stop).
1. Problem: the cost of inaction (status quo) without numbers#
1.1 What happens without agents#
Typical symptoms in an engineering organization where “speed” is sustained by manual work:
- a meaningful share of time goes to toil (data gathering, digging through logs, manual triage, coordination, repeatable changes)
- key knowledge lives in people’s heads (low Bus factor); the team is blocked on experts
- incidents consume focus and turn the roadmap into an illusion
- quality fluctuates because some issues are caught too late
1.2 Why it is expensive#
It is expensive through three channels:
- direct cost: paid engineer time goes to mechanical work
- opportunity cost: strategic initiatives stall because the team has no deep-work window
- risk cost: change failures, incidents, security mistakes, and regulatory outcomes
2. Solution: AI agents as a “discipline accelerator”#
2.1 What exactly we are adopting (scope/boundaries)#
AI agents are not a “magic engineer.” In a correct framing, they are:
- an execution tool for repeatable SOP steps
- an amplifier for analysis and decision preparation (drafts, hypotheses, alternatives)
- a standardization mechanism (templates, checklists, quality gates) that reduces execution variance
2.2 What we are not doing#
- we do not grant broad production permissions by default
- we do not replace engineering judgment with “the model said so”
- we do not scale without quality and safety loops
3. Effect model (template, without numbers)#
Instead of a single ROI number, use variables. That makes the model portable across teams and domains.
3.1 Variables (plug in your values)#
Team and time:
<TEAM_SIZE>— team size<COST_PER_ENGINEER_HOUR>— internal cost per engineer hour (or a rate for modeling)<ROUTINE_SHARE>— current share of routine work (estimate via time tracking / survey / sampling)
Incidents and operations:
<INCIDENTS_PER_PERIOD>— incidents per period<DOWNTIME_COST_MODEL>— how the business models downtime/degradation cost (not necessarily money/hour; could be SLA penalties, churn, funnel loss)<MTTR_BASELINE>— baseline mean time to recovery
Quality and security:
<CHANGE_FAILURE_RATE_BASELINE>— baseline change failure rate<SECURITY_RISK_MODEL>— risk model (expected impact × probability) or a list of critical scenarios
Adoption:
<ADOPTION_TARGET>— expected adoption level<TRAINING_INVESTMENT>— time to train and set up<MAINTENANCE_BUDGET>— time to maintain templates, eval sets, and guardrails
3.2 Effect hypotheses (what should change)#
Write these as hypotheses you can confirm or refute:
- routine share decreases materially because repeatable steps are delegated
- feedback loops improve (facts/diagnostics/drafts appear faster)
- quality improves via gates, verification, and lower execution variance
- operational resilience improves: fewer fires, better escalation, less expert dependency
3.3 How to evaluate impact (without promises)#
Use a simple framing:
- Value = reclaimed engineering time + reduced risk cost + accelerated strategic initiatives
- Cost = adoption time + maintenance + cost of mistakes/regressions along the way + infra/licenses (if applicable)
4. Risk register (template)#
For each risk, capture: scenario → consequences → mitigations → how we verify.
Risk: the agent proposes or applies a wrong fix#
- Scenario: the agent proposes or performs an incorrect fix
- Mitigations: human review, allowlist, conservative escalation on uncertainty, dry run (
--check) + canary/gradual rollout (serial) + rollback path/plan, kill switch - Verification: training scenarios + golden tests
Risk: prompt injection via logs / input data#
- Scenario: logs/tickets contain instructions that the agent treats as commands
- Mitigations: sanitization, strict guardrails, ban dangerous commands, out-of-band approval
- Verification: staged injection tests in a sandbox
Risk: secrets/PII leakage#
- Scenario: the agent includes secrets or PII in reports/comments
- Mitigations: redaction, secret scanning, ban raw-log publishing, safe channels
- Verification: test data + output checks
5. Pilot plan (go/no-go)#
5.1 Pilot format#
- pick a constrained scope: one workflow / one team / one incident class
- define artifacts up front: prompt templates, SOPs, quality gates, verification plan, threat model—minimal sufficient set
- define permissions: default to
read_only; writes only for approved scenarios with explicit approval
5.2 Success criteria (no numbers, but verifiable)#
- Adoption: the team uses the practice repeatedly (not “tried once”)
- Quality: errors/regressions are caught earlier than production
- Safety: no incidents caused by ad-hoc agent actions; actions are auditable
- Productivity: routine load decreases enough to free strategic capacity
5.3 Decision log#
At the end of the pilot, record:
- what worked
- where the agent failed and why
- which guardrails/templates are needed before scaling
6. ROI dashboard (no numbers, but with fields)#
## AI Agents dashboard — [Period]
### Adoption
- Who uses the practice and for which workflows
- Where resistance shows up and why
### Quality
- Which agent errors repeat
- Which gates/verification catch them earlier
### Security
- Any attempts at dangerous actions
- How stop conditions and approval flows behaved
### Productivity
- Which toil categories moved into delegation
- Where strategic capacity appeared
### Business narrative
- Which risk was reduced
- Which initiatives accelerated
7. Stakeholder communication (template)#
For the CEO / the board#
- we are not “adopting a toy”; we are building execution discipline at scale
- risk is covered by guardrails and quality loops
- the decision is made by pilot evidence
For the CTO / VP Engineering#
- impact shows up in predictability and lower execution variance
- governance and a security baseline are part of the project, not “later”
For engineers#
- agents remove toil, but responsibility stays with humans
- trust, but verify is the default rule