Insight

Jan 28, 2026

The Missing Layer in AI Infrastructure

The AI stack has consensus layers for data, models, orchestration, and observability. Governance is called "cross-cutting," which in practice means scattered across tools and owned by no one. This post maps the gap between artefact governance (which version is deployed) and decision governance (what just happened, and was it allowed).

TL;DR The AI stack has layers for storing data, training models, serving predictions, orchestrating workflows, and observing behaviour. It doesn't have a layer for governing decisions. Model registries track artefacts (which version is deployed). Nothing tracks decisions (who invoked what, under what authority, whether it should proceed). Feature flags, guardrails, approval workflows, and monitoring each solve part of the problem. None coordinates the whole. Cross-cutting governance means scattered governance. Scattered governance means fragments instead of records.

The AI stack has settled into a recognisable shape. Data infrastructure at the bottom. Model training and serving in the middle. Orchestration to coordinate workflows. Observability to watch what happens. Application interfaces at the top.

This architecture has consensus. Cloud providers, MLOps vendors, and platform teams all speak roughly the same language. The layers are well-understood, well-tooled, and well-funded.

What's missing is governance. Not governance as a concept. Governance as a layer.

The Stack Everyone Agrees On

Walk through any enterprise AI architecture and you'll find the same components:

Data infrastructure handles ingestion, storage, quality, and lineage. Feature stores manage the inputs models consume. Data catalogs track what exists and who owns it.

Model layer covers training, versioning, and serving. Model registries track which version is deployed where. MLOps pipelines automate the path from experiment to production.

Orchestration coordinates workflows. Airflow, Prefect, Temporal, and their competitors schedule tasks, manage dependencies, and handle retries.

Observability watches what happens. Monitoring tracks performance. Drift detection flags when inputs or outputs shift. Alerts fire when thresholds breach.

Each layer has mature tooling. Each has clear ownership. Each has a well-defined interface to the layers above and below.

Where Governance Lives Today

Ask where governance fits and you'll get a familiar answer: it's cross-cutting. Governance touches every layer. It's everyone's concern.

In practice, cross-cutting means scattered.

Model registries provide artefact governance. They track which model version is deployed, when it was promoted, who approved it. They maintain lineage from training data to production endpoint. They enable rollback when something goes wrong.

This is valuable. But it's incomplete.

A model registry can tell you that credit-scoring-model v2.1 was deployed on Tuesday. It cannot tell you whether the system allowed that model to deny a mortgage to Applicant X at 14:32 because they matched a higher-risk threshold. It tracks the artefact. It doesn't track the decision.

The distinction matters: artefact governance asks "which version is running?" Decision governance asks "what just happened, and was it allowed?"

The Partial Solutions

Teams have built workarounds. Each addresses part of the problem. None addresses the whole.

Feature flags decouple deployment from release. They enable gradual rollouts, canary testing, and rapid rollback. They control which version of code runs. They don't control whether a specific decision should execute.

Guardrails filter inputs and outputs. They block toxic content, detect PII, prevent jailbreaks. NeMo Guardrails, Guardrails AI, and similar tools have matured rapidly. But guardrails check for safety (is this output harmful?). Governance checks for permissibility (is this user allowed to ask this? does this request violate business rules? is this action consistent with policy?). Safety is a subset of governance. It isn't the whole picture.

Policy engines like OPA provide the logic for governance. They evaluate rules, return allow/deny decisions, and can be embedded in API gateways or Kubernetes admission controllers. But policy engines are the evaluation mechanism, not the system of record. They answer "is this allowed?" They don't inherently track who invoked what, maintain decision history, or coordinate escalation. The engine exists. The layer around it often doesn't.

Approval workflows route decisions through human reviewers. They're effective for pre-deployment gates: should this model go live? They don't help with runtime decisions. Approval workflows operate at deployment frequency (once per release). Governance operates at inference frequency (once per request). The gap between those frequencies is where decisions slip through unrecorded.

Monitoring tracks what happened. It detects drift, measures latency, flags anomalies. But monitoring is reactive. It tells you something went wrong after the decision executed. It doesn't prevent the decision from executing in the first place.

Orchestration coordinates workflows. It knows when to run tasks and in what order. It doesn't inherently know whether a specific decision is authorised. You can encode approval logic in a DAG, but then orchestration teams inherit governance responsibilities they shouldn't own.

Each tool solves a real problem:

Tool	What it handles	What it doesn't handle
Feature flags	Safe deployment, rollouts	Whether a specific decision should execute
Guardrails	Content safety, PII, jailbreaks	Permissibility, business rules, policy compliance
Policy engines	Rule evaluation, allow/deny	Decision history, escalation, system of record
Approval workflows	Pre-deployment gates	Runtime decisions (operates at deployment frequency)
Monitoring	Performance tracking, drift	Prevention (reactive, not proactive)
Orchestration	Workflow coordination	Decision authorisation

None of them answers the runtime question: who is invoking this decision, under what authority, and should it proceed?

The Gap

The missing layer sits between model output and business action. It would answer:

Who is invoking this decision? Not just which service, but which user, role, or delegation chain.

What are they trying to do? Not just "call the model" but the specific action and its parameters.

Whether this action is allowed, given the current context, policies, and constraints.

Then approve (and record), refuse (and record), or escalate (and record).

The timing matters. This happens before the action executes, not after. Governance isn't observation. It's enforcement.

Most organisations reconstruct compliance evidence after incidents, pulling fragments from scattered systems. The audit trail exists in pieces: some in the model registry, some in orchestration logs, some in application databases. Stitching them together is manual, slow, and often incomplete.

Why Cross-Cutting Doesn't Work

The standard answer is that governance should be embedded everywhere. Every tool should handle its piece. The model registry tracks versions. The orchestrator encodes approvals. The monitoring system logs decisions. The application enforces policies.

This sounds sensible until you trace a decision through the stack.

A user invokes a model. The orchestrator schedules the task. The model serves a prediction. The application acts on it. The monitoring system logs the outcome.

Now answer: was that decision authorised? Which policy applied? What version of the policy? Who had authority to invoke it? Where's the evidence?

The answer is: it depends. It depends on what each tool logged, whether the logs are consistent, whether anyone stitched them together, whether the timestamps align. Cross-cutting governance means no single system can answer the question. You reconstruct the answer from fragments, if you can reconstruct it at all.

In regulated domains, "it depends" isn't acceptable. Auditors don't want fragments. They want a decision record: who, what, when, under which rules, with what outcome.

What a Governance Layer Would Do

A distinct layer would coordinate what's currently scattered:

Bind actions to authority. Every decision knows who invoked it, in what role, with what permissions. Not just authentication (are they logged in?) but authorisation for this specific action.

Enforce constraints at runtime. Policies evaluate before execution, not after. Refusal is a designed outcome, not an error. Escalation routes to the right reviewer.

Generate decision records. Every action produces a sealed, timestamped artifact: who tried to act, what they tried, what the system decided, under which rules. Suitable for audit, discovery, or replay.

Enable reconstruction. When something goes wrong, you can trace the exact sequence. When regulations change, you can identify which decisions were made under the old policy.

This layer doesn't replace existing tools. Model registries still track artifacts. Orchestrators still coordinate workflows. Monitoring still watches performance. The governance layer coordinates them into a coherent decision lifecycle.

The Counterargument

Skeptics point out that adding a layer adds complexity. Another hop. Another failure mode. Another team to coordinate with.

Fair. But the same was true of every layer in the stack.

Data infrastructure added complexity. Before data lakes, teams managed files. Before feature stores, teams copied datasets. The complexity was worth it because scattered data created worse problems than centralised data infrastructure.

The same logic applies to governance. Scattered governance creates worse problems than a coordination layer would. Alert fatigue. Fragmented audit trails. Policies written but not enforced. Compliance reconstructed post-hoc instead of recorded in real-time.

There's also the latency concern. A governance layer that adds 500ms to every request is a non-starter. But this layer must operate at application speed, not compliance-team speed. The pattern is familiar: API gateways, load balancers, and policy engines all sit in the critical path and handle millions of requests per second. A governance layer that can't do the same isn't viable. One that can is just another high-performance component.

The question isn't whether a governance layer adds complexity. It's whether the complexity of not having one is worse.

For autonomous systems operating in regulated domains, making thousands of decisions without human review, the answer is increasingly clear.

The Question This Raises

The previous post argued that high-stakes decisions need to be treated as explicit, auditable objects. The question is where those objects live.

Today, they live nowhere. They're inferred from logs, reconstructed from fragments, assembled after the fact.

A governance layer would give them a home. Not as a replacement for existing infrastructure, but as the coordination point that turns scattered governance into operational governance.

The stack has layers for storing data, training models, serving predictions, orchestrating workflows, and observing behaviour. It doesn't yet have a layer for governing decisions.

That gap is becoming harder to ignore.

This is the second post in "The Decision Layer," a series on governing AI decisions in production.

Have thoughts on where AI and governance meet?

We’re always looking for collaborators exploring how decisions can become verifiable.

Start a conversation