Research PreviewMRP-2026-02v1.018 May 2026Public Distribution26 pages1.5 MB

When AI hedges and policy commits

Why an AI agent reaches for 'needs review' where a policy engine gives a straight yes or no

By Sam Carter

TL;DR

We ran 283 real UK procurement decisions through both an AI agent and MeshQu's policy engine at the same moment, binding every verdict to a signed receipt.
They almost never agreed — but not because the agent was reckless. The policy engine gave a straight allow or deny; the agent kept reaching for 'needs review' when the evidence was incomplete.
The fix turned out to be about how the policy is written, not the agent: relaxing one over-strict rule lifted agreement roughly elevenfold.

Abstract

Regulated firms cannot routinely answer how a specific AI-augmented decision was made. We ran 300 public UK Contracts Finder filings through an LLM agent and the same records through a MeshQu policy evaluator at the same moment. Both verdicts, the agent's reasoning, the exact policy snapshot, and a substrate provenance envelope were bound into a single Ed25519-signed receipt anchored to Sigstore Rekor. MeshQu produced 144 ALLOW and 139 DENY; the agent produced 7 ALLOW, 276 REVIEW, and zero DENY. Naive agreement is 7 of 283 — disagreement shaped as non-commitment under incomplete evidence, not over- permissiveness. A counterfactual demoting one rule from critical-DENY to a REVIEW band lifts agreement roughly elevenfold — a finding about policy authoring, not agent capability.

“MeshQu produced 144 ALLOW and 139 DENY; the agent produced 7 ALLOW, 276 REVIEW, and zero DENY across 283 unique decisions.”
“What the corpus measures is not 'agent right or wrong'. It is two systems with different verdict spaces examining the same evidence.”
“MeshQu produces a committed binary verdict; the agent produces a verdict plus a hedge.”
“Demoting a single rule from critical-by-default DENY to a REVIEW band lifts agreement roughly elevenfold.”
“This is a finding about policy authoring, not agent capability.”

Download PDF

PDF · 26 pages · 1.5 MB

References

All research