/ 7 min read framework ops

Cost of delay, as a triage signal

cubby ai team

An AI agent with a queue has the same problem a product manager has. Too many things to do, finite throughput, no obvious answer to which one first. The standard answers — FIFO, severity tiers, whoever shouts loudest — are all bad in the same direction. They optimize for the work item in front of you, not for the economic loss the queue is incurring while it waits.

Don Reinertsen's Principles of Product Development Flow (2009) supplied the cleanest economic answer the field has, and it ports almost unchanged to agent triage. Reinertsen's definition, quoted in DX's writeup of Cost of Delay, is operational: cost of delay is "the partial derivative of lifecycle profit with respect to a change in the availability of a product." In plain English: dollars per unit time of waiting. He pairs the metric with a rule of thumb that has aged well — "if you only quantify one thing, quantify the cost of delay" — and an observation that roughly 85 percent of product managers can't tell you what a month of delay on a given item would cost.

If you cannot price the wait, you cannot triage.

WSJF: the formula, not the ceremony

The Scaled Agile Framework's WSJF guidance operationalizes Reinertsen for a backlog: rank items by cost of delay divided by job duration. SAFe decomposes cost of delay into three relative inputs (user-business value, time criticality, risk reduction or opportunity enablement) and pairs it with a fourth (job size) as the denominator. The intuition is queue-theoretic: when service capacity is finite, the policy that maximizes throughput-weighted value is shortest-job-first weighted by economic urgency. That is WSJF.

You can keep the math and ditch the rituals. The math is the part that survives translation into a system where the operator is an agent and the queue is updated every few seconds.

Why this maps cleanly onto agent triage

Three structural reasons.

1. Agent queues are dominated by tail latency. An agent juggling research, drafts, and tool calls has hard throughput limits — context window, tool rate limits, model inference time. Throughput-weighted economics is exactly what queue theory was built for, and the SAFe team is explicit that WSJF was adapted from queuing theory and Reinertsen's cost-of-delay framing.

2. Most tasks have a knowable urgency curve. A research task for "should we move auth providers" has a flat urgency curve over weeks. A research task for "is the auth provider's incident affecting our users right now" is a step function with the step at minutes. The 2025 survey AIOps in the Era of Large Language Models found that across 183 papers spanning 2020–2024, the empirically hardest part of agent design is not the model — it is failure perception and prioritization under partial observability. Cost of delay is the cleanest scalar to feed into that.

3. Agents have legible job size. Estimating job size for a human team is famously hard. Estimating job size for an agent is comparatively easy: tokens, tool calls, expected wall-clock. That is the WSJF denominator handed to you on a plate.

The cubby ai-shaped version

We do not run our agent on a SAFe backlog. We run it on a stream. So we use a degenerate, real-time version of WSJF, expressed in two numbers and a verb:

  • CoD/min for each pending task: the responder's stated dollars-per-minute of waiting, or the system's default if unstated. Default beats blank.
  • Expected cost-to-complete in seconds of wall-clock plus dollars of inference and tools.
  • Triage verb: now, batch, later, drop.

The ranking is CoD/min × expected_remaining_value divided by expected_cost_to_complete. The verb is the result, surfaced first. Hirschorn's example in the DX writeup — pricing engineer salaries against build-time reduction to justify internal investment — is the same operation at a different time scale.

The point is not the precision of the number. The point is that the number is there, visible, and can be argued with. A team that cannot show its cost-of-delay column is not triaging. It is reacting to whoever has the most recent message.

Stance

Borrow the math. Drop the spreadsheet. The shape we want, every time, is the same shape the rest of cubby ai uses: the next decision first (which task to run), then the evidence (CoD/min, expected cost), then the rejected alternatives (what we are choosing not to run, and what it would cost to skip it further).

If you only quantify one thing, quantify the cost of delay. Reinertsen was right in 2009. He is more right when the operator is an agent.

Sources