Chapter contents

Appendix A: business case template for adopting AI agents

Target audience: CTOs, VPs of Engineering, engineering managers, business stakeholders
Goal: help justify adopting AI agents without false precision—through variables, risks, and verifiable success criteria
Date: 2026-01-17


Executive summary (template)#

Proposal: adopt AI agents to automate repeatable engineering tasks and accelerate feedback loops (analysis, first-pass triage, change drafts, documentation, checklist-based verification).

Why now: growing load and complexity make manual work increasingly expensive; risk is rising too (security, quality, incidents). Without process discipline, speed turns into debt.

How we reduce risk: guardrails, stop conditions, verification plans, least privilege, audit trail, eval datasets, and golden tests.

How we decide: run a pilot, measure against pre-agreed criteria, and make a go/no-go call (continue/stop).


1. Problem: the cost of inaction (status quo) without numbers#

1.1 What happens without agents#

Typical symptoms in an engineering organization where “speed” is sustained by manual work:

  • a meaningful share of time goes to toil (data gathering, digging through logs, manual triage, coordination, repeatable changes)
  • key knowledge lives in people’s heads (low Bus factor); the team is blocked on experts
  • incidents consume focus and turn the roadmap into an illusion
  • quality fluctuates because some issues are caught too late

1.2 Why it is expensive#

It is expensive through three channels:

  • direct cost: paid engineer time goes to mechanical work
  • opportunity cost: strategic initiatives stall because the team has no deep-work window
  • risk cost: change failures, incidents, security mistakes, and regulatory outcomes

2. Solution: AI agents as a “discipline accelerator”#

2.1 What exactly we are adopting (scope/boundaries)#

AI agents are not a “magic engineer.” In a correct framing, they are:

  • an execution tool for repeatable SOP steps
  • an amplifier for analysis and decision preparation (drafts, hypotheses, alternatives)
  • a standardization mechanism (templates, checklists, quality gates) that reduces execution variance

2.2 What we are not doing#

  • we do not grant broad production permissions by default
  • we do not replace engineering judgment with “the model said so”
  • we do not scale without quality and safety loops

3. Effect model (template, without numbers)#

Instead of a single ROI number, use variables. That makes the model portable across teams and domains.

3.1 Variables (plug in your values)#

Team and time:

  • <TEAM_SIZE> — team size
  • <COST_PER_ENGINEER_HOUR> — internal cost per engineer hour (or a rate for modeling)
  • <ROUTINE_SHARE> — current share of routine work (estimate via time tracking / survey / sampling)

Incidents and operations:

  • <INCIDENTS_PER_PERIOD> — incidents per period
  • <DOWNTIME_COST_MODEL> — how the business models downtime/degradation cost (not necessarily money/hour; could be SLA penalties, churn, funnel loss)
  • <MTTR_BASELINE> — baseline mean time to recovery

Quality and security:

  • <CHANGE_FAILURE_RATE_BASELINE> — baseline change failure rate
  • <SECURITY_RISK_MODEL> — risk model (expected impact × probability) or a list of critical scenarios

Adoption:

  • <ADOPTION_TARGET> — expected adoption level
  • <TRAINING_INVESTMENT> — time to train and set up
  • <MAINTENANCE_BUDGET> — time to maintain templates, eval sets, and guardrails

3.2 Effect hypotheses (what should change)#

Write these as hypotheses you can confirm or refute:

  • routine share decreases materially because repeatable steps are delegated
  • feedback loops improve (facts/diagnostics/drafts appear faster)
  • quality improves via gates, verification, and lower execution variance
  • operational resilience improves: fewer fires, better escalation, less expert dependency

3.3 How to evaluate impact (without promises)#

Use a simple framing:

  • Value = reclaimed engineering time + reduced risk cost + accelerated strategic initiatives
  • Cost = adoption time + maintenance + cost of mistakes/regressions along the way + infra/licenses (if applicable)

4. Risk register (template)#

For each risk, capture: scenario → consequences → mitigations → how we verify.

Risk: the agent proposes or applies a wrong fix#

  • Scenario: the agent proposes or performs an incorrect fix
  • Mitigations: human review, allowlist, conservative escalation on uncertainty, dry run (--check) + canary/gradual rollout (serial) + rollback path/plan, kill switch
  • Verification: training scenarios + golden tests

Risk: prompt injection via logs / input data#

  • Scenario: logs/tickets contain instructions that the agent treats as commands
  • Mitigations: sanitization, strict guardrails, ban dangerous commands, out-of-band approval
  • Verification: staged injection tests in a sandbox

Risk: secrets/PII leakage#

  • Scenario: the agent includes secrets or PII in reports/comments
  • Mitigations: redaction, secret scanning, ban raw-log publishing, safe channels
  • Verification: test data + output checks

5. Pilot plan (go/no-go)#

5.1 Pilot format#

  • pick a constrained scope: one workflow / one team / one incident class
  • define artifacts up front: prompt templates, SOPs, quality gates, verification plan, threat model—minimal sufficient set
  • define permissions: default to read_only; writes only for approved scenarios with explicit approval

5.2 Success criteria (no numbers, but verifiable)#

  • Adoption: the team uses the practice repeatedly (not “tried once”)
  • Quality: errors/regressions are caught earlier than production
  • Safety: no incidents caused by ad-hoc agent actions; actions are auditable
  • Productivity: routine load decreases enough to free strategic capacity

5.3 Decision log#

At the end of the pilot, record:

  • what worked
  • where the agent failed and why
  • which guardrails/templates are needed before scaling

6. ROI dashboard (no numbers, but with fields)#

## AI Agents dashboard — [Period]

### Adoption
- Who uses the practice and for which workflows
- Where resistance shows up and why

### Quality
- Which agent errors repeat
- Which gates/verification catch them earlier

### Security
- Any attempts at dangerous actions
- How stop conditions and approval flows behaved

### Productivity
- Which toil categories moved into delegation
- Where strategic capacity appeared

### Business narrative
- Which risk was reduced
- Which initiatives accelerated

7. Stakeholder communication (template)#

For the CEO / the board#

  • we are not “adopting a toy”; we are building execution discipline at scale
  • risk is covered by guardrails and quality loops
  • the decision is made by pilot evidence

For the CTO / VP Engineering#

  • impact shows up in predictability and lower execution variance
  • governance and a security baseline are part of the project, not “later”

For engineers#

  • agents remove toil, but responsibility stays with humans
  • trust, but verify is the default rule