Software has always been about managing complexity through better abstractions. Data got databases. Communication got APIs. State changes got events. Each abstraction emerged not as a theoretical invention, but as a response to operational pain that teams could no longer manage informally.
We're approaching that moment for decisions.
The Pattern We Keep Rediscovering
When AI systems make thousands of consequential choices per second (credit approvals, fraud flags, content moderation, clinical risk scores) the question shifts. "Did the system work?" becomes "Can we prove this specific outcome was appropriate?"
That requires a different kind of accountability. Our current abstractions aren't built for it.
Consider what happens when something goes wrong. An automated system flags a transaction as fraudulent. A customer complains. A regulator asks questions. You go to your logs and find... events. State changes. Metadata. But reconstructing why that specific decision happened, under which rules, with what evidence turns into archaeology. The data exists, but it isn't organised around the question the organisation now needs to answer.
The problem isn't logging. It's the abstraction itself.
Events Tell You What Happened. Decisions Tell You What Was Chosen.
The distinction is worth unpacking.
An event captures state change: LoanApproved, ClaimDenied, AccountFlagged. With good discipline, you can attach metadata like timestamps, user IDs, maybe even model versions. But events are optimised for integration, not governance. They answer "what happened" without necessarily answering "what was permitted to happen, under which constraints, by whose authority." They don't tell you whether the system should have been allowed to choose that outcome at all.
A decision, treated as a first-class object, carries different commitments:
Attribute | What it captures |
|---|---|
Authority | Who or what was authorised to make this choice? What delegation chain applies? |
Policy context | Which specific rules were in force? What versions? |
Evidence | What inputs were used? The actual artefacts available at decision time, not current state. |
Rationale | A trace of which checks fired, which constraints bound, which exceptions applied. |
Replay-ability | Enough structure to reproduce "why this outcome" under the historical configuration. |
Lifecycle | Was this decision challenged, overturned, superseded, or time-bounded? |
You can encode all of this in events. Teams rarely do, because that's not what events are for.
Consider: two identical loan applications approved a week apart under different policy versions. Without decision-level context, that's just two LoanApproved events. With it, you can see that the first was approved under Policy v2.3 and the second under v2.4, and explain why the outcomes were consistent (or flag that they weren't).
The Failure Mode Is Accountability Diffusion
When decisions aren't explicit, responsibility spreads until no one feels answerable for outcomes. "The system decided" becomes the default explanation. Which is no explanation at all.
A recurring pattern in post-mortems of automated decision-making failures bears this out. Australia's Robodebt scheme. Michigan's MiDAS unemployment fraud system. The Dutch childcare benefits scandal. In each case, the core problem wasn't that logs were missing. It was that no one could reconstruct, with confidence, which rules applied to which inputs to produce which outcomes for which individuals.
A decision object doesn't prevent discrimination or error. But it makes it much harder to operate a system that cannot explain itself in terms of inputs, policies, and authority.
The Idea Isn't New. The Stakes Are.
Decision management systems have existed for decades. IBM ODM, FICO Blaze Advisor, and their descendants became standard infrastructure in credit underwriting, insurance pricing, and claims triage. The Decision Model and Notation (DMN) standard attempted to make decisions as model-able as processes.
So why didn't "decisions as infrastructure" become ubiquitous everywhere?
Partly cost and complexity. Partly cultural fit, since developers preferred logic in code, governed by tests and code review. Partly because the compliance burden didn't justify the overhead for most decisions. Decision infrastructure didn't fail. It just only paid for itself when decisions were clearly tied to risk and accountability.
What's changed is the intersection of AI and regulation. When decisions involve machine learning models, the governance surface expands to include model versions, training data lineage, feature drift, and confidence thresholds. When regulations like the EU AI Act impose record-keeping requirements for high-risk systems (including traceability of inputs, outputs, and the humans who verified results) the cost of not having decision infrastructure rises sharply.
The ROI calculus is shifting. Not for every decision. But for decisions that create material risk.
The Convergence Is Already Happening
Look at where practitioners are arriving independently:
Policy-as-code (Open Policy Agent and its ecosystem) treats policy decisions as a distinct layer with dedicated logging. Every OPA decision carries a decision ID, policy revision, timestamp, inputs, and result. Not as optional metadata, but as the core schema.
Decision Intelligence (as framed in the analytics and operations literature) reframes the unit of analysis from "model" to "decision," with explicit attention to how decisions are made, evaluated, and improved through feedback.
AI governance frameworks (NIST AI RMF, EU AI Act compliance guides) specify record-keeping requirements that map cleanly onto decision objects: inputs, outputs, model versions, human oversight, and contest-ability mechanisms.
These aren't coordinated efforts. They're independent responses to the same underlying pressure: when systems make consequential choices at scale, someone eventually asks for proof that those choices were appropriate.
What This Implies
If decisions are becoming a first-class integration surface, a few things follow:
Scope carefully. Not every decision needs this treatment. The test is whether a decision creates accountability requirements, whether legal, regulatory, financial, or reputational. Internal tooling and low-stakes automation can stay simple.
Learn from what didn't work. Heavy decision-management platforms struggled with adoption because they demanded too much upfront investment and cultural change. In particular, avoid treating "decision governance" as a monolithic platform that everything must pass through. A useful decision abstraction should be incrementally adoptable, starting with the highest-risk decisions.
Treat latency seriously. Adding a governance layer means adding a hop. Treating a decision as an interface means explicitly requesting a judgement, with inputs, constraints, and authority, rather than logging a side-effect after the fact. For high-stakes decisions, this is acceptable. Treat it like payments infrastructure, with SLO-grade reliability and fail-safe defaults. For everything else, don't over-engineer.
Expect the boundary to be fuzzy. Where does a "decision" begin and end? When AI recommends and a human selects, is that one decision or two? The practical answer: define a decision as the smallest unit at which accountability, explanation, and redress are required. That boundary is domain-specific, not universal.
The Question Worth Sitting With
The interesting question isn't whether to treat decisions as APIs. The industry is already doing it piecemeal, under different names, in different contexts. The question is whether we recognise this as a coherent pattern and build accordingly, or keep reinventing it project by project.
Better discipline inside existing abstractions won't get us there. We need an abstraction designed around accountability rather than execution. Less "did we log enough?" and more "did we log the right thing?"
APIs succeeded because they compressed complexity into a governable interface that aligned with business value. Events succeeded because they made state changes explicit and replay-able. Decisions may be next. The abstraction isn't particularly clever. But the pain of operating without it is becoming harder to ignore.
At least for the decisions that matter.
This is the first post in "The Decision Layer," a series on governing AI decisions in production.
We’re always looking for collaborators exploring how decisions can become verifiable.
Let’s build the future of compliant AI together.
If your institution is exploring AI governance, policy-as-code, or explainable infrastructure, we’d like to collaborate.



