Logs Are Not Proof

High-stakes decisions are becoming a first-class integration surface, requiring the same disciplines APIs earned: explicit contracts, versioning, and audit-grade evidence. This post explores why events and logs fall short for AI governance, and why decisions may be the missing abstraction.

TL;DR Events tell you what happened. Decisions tell you what was chosen, under which constraints, with what authority, using what evidence. As AI systems make more consequential choices autonomously, the absence of decision-level records creates accountability diffusion: outcomes happen, but no one can reconstruct why, or whether the outcome was appropriate. The decision is becoming the unit at which governance, explanation, and redress are required. Treating it as a first-class object isn't a new idea. The stakes are new.

Every system has logs. Logging is table stakes. If something goes wrong, you check the logs. If an auditor asks what happened, you pull the logs. If a regulator wants evidence, you export the logs.

The assumption is that if we log enough, with enough structure, we can prove compliance.

The assumption is wrong.

What Logs Capture

Logs are good at recording what systems did. A well-structured log entry might include: timestamp, user ID, service name, action type, request payload, response code, latency. With effort, you can add trace IDs, correlation keys, and structured metadata. With more effort, you can enforce schemas, centralise aggregation, and build dashboards.

This is useful. It tells you that User A called Service B at Time T and received Response R. It tells you the sequence of events. It tells you what the system observed about its own behaviour.

What it doesn't tell you is whether any of that should have happened.

The Distinction That Matters

Regulators and auditors draw a line that practitioners often blur: the difference between a log file and an audit trail.

	Log file	Audit trail
Focus	System-centric	Governance-centric
Contents	Events, errors, performance metrics	Who, what, when, why, before/after values
Purpose	Troubleshooting	Compliance, accountability
Question answered	What did the system do?	Was this appropriate?

A log file can tell you that a record was modified. An audit trail can tell you who approved the modification, under what authority, for what reason, and whether the approval followed policy.

Most systems have logs. Few have tamper-evident, policy-linked, decision-centric audit trails.

Logs explain what happened. Proof explains why it was allowed.

What Logs Don't Capture

Consider a credit decision. The log shows: Model v2.1 returned score 0.73 for applicant X at 14:32 UTC. Application denied.

The governance questions are different:

Was the model authorised for this decision type?
Which policy version governed the denial threshold?
Did the applicant fall into a protected category requiring additional review?
Was the decision recorded in a way that can't be altered after the fact?
Can we prove the log entry is complete and hasn't been tampered with?

The log answers none of these. It records what the system did. It doesn't establish whether the system did what it should.

The Auditor's Question

When auditors examine a process, they don't ask "can you show me what happened?" They ask "can you prove this was appropriate?"

Audit standards are specific about what constitutes proof. Evidence must be relevant, reliable, sufficient, authentic, and complete. Logs typically fail on multiple dimensions:

Reliability: Logs are internal, single-source records. Auditors consider evidence from independent external sources more reliable. Logs generated by the system being audited are presumptively weaker evidence.

Authenticity: Logs stored in standard filesystems can be modified by anyone with administrative access. Without cryptographic controls, logs are evidence of what was recorded, not proof of what occurred.

Completeness: Logs have no inherent mechanism to prove they're complete. The absence of a log entry might mean nothing happened, or it might mean the logging failed, or it might mean someone deleted the entry.

One audit guidance document puts it bluntly: "Without documentation on control or records regarding performance, we must rely on inquiry alone. Inquiry is the weakest form of audit evidence."

The distinction plays out in practice. An organisation with complete access logs for an entire year was asked by auditors for "evidence of access reviews." The logs showed every access event. What the auditor wanted was evidence that someone had reviewed those logs and verified the access was appropriate. The logs existed. The governance action was undocumented.

Logs are documentation of system behaviour. They're not documentation of control effectiveness.

The Reconstruction Problem

When compliance gaps surface, organisations often try to reconstruct evidence from logs. This process reveals both the effort required and its limits.

Reconstructing a single decision might require: correlating logs across multiple services, reconciling timestamp discrepancies, identifying which policy version was active at decision time, verifying that no log entries were modified or deleted, and establishing that the logging itself was functioning correctly throughout the period.

One study of financial services firms found that reconstructing evidence of trade authorisation took months of manual work. The logs showed execution, but not authorisation. Email archives showed approval requests, but subject lines were truncated. Access logs showed system login, but not which team member was at the terminal. The result was qualified evidence, not conclusive proof.

The firm had comprehensive logging. What it lacked was records designed for governance.

What Proof Requires

The gap between logs and proof isn't about logging better. It's about what proof structurally requires:

Property	What it means	Why logs fail
Tamper-evidence	Each record cryptographically linked so alterations are detectable	File permissions aren't proof; hash chains are
Policy binding	Each decision linked to the specific policy that governed it	Logs record actions, not which rules applied
Decision rationale	Why this outcome, not just what outcome	Logs show the model returned 0.73, not why that triggered denial
Completeness assurance	Positive proof that all relevant events were captured	Absence of entries proves nothing
Non-repudiation	Proof that the actor cannot deny the action	Requires signatures at decision time, not retroactive logging

These aren't logging features. They're properties of records designed for accountability rather than visibility.

The Structural Gap

Better logging infrastructure won't close this gap. Structured JSON with consistent schemas makes logs more searchable. Centralised aggregation makes correlation easier. Retention policies prevent premature deletion. Real-time alerting catches anomalies faster. Immutable storage and cryptographic chaining make tampering detectable.

All of this is valuable. Observability tools are essential for speed: reducing mean time to resolution, debugging production issues, understanding system behaviour. But observability tools are for speed. Governance requires truth. You need both, but you cannot use one to do the job of the other.

None of the above transforms logs into proof.

A system can have beautiful, structured, centralised, tamper-evident logs and still fail an audit. Because the auditor isn't asking "do you have logs?" The auditor is asking "can you prove your controls worked?"

The honest framing: logs are necessary but not sufficient. You also need governance artifacts (policies, approval records, review procedures) that logs don't capture. These can be simple (emails, spreadsheets, meeting notes) if they're contemporaneous and documented. But they have to exist. If you're doing governance well, logs will support your proof. If you're not doing governance, no amount of logging infrastructure fixes that.

Proof requires records that answer governance questions, not system questions. Records that bind decisions to policies, capture rationale, ensure completeness, and resist tampering. Records designed for accountability from the start, not retrofitted from operational telemetry.

The Recognition

Most organisations suspect this. The compliance team knows that assembling audit evidence is painful. The engineering team knows that logs weren't designed for governance. The security team knows that log integrity is an unsolved problem.

What's less common is naming it clearly: logs are visibility infrastructure. They show you what happened. They don't prove it happened correctly.

The gap between those two isn't a logging problem. It's a governance problem. And governance problems don't get solved by logging harder.

Part of "The Decision Layer," a series on governing AI decisions in production.

Have thoughts on where AI and governance meet?

We’re always looking for collaborators exploring how decisions can become verifiable.

Start a conversation