Research PreviewMRP-2026-02v1.018 May 2026Public Distribution26 pages1.5 MB
When AI hedges and policy commits
Why an AI agent reaches for 'needs review' where a policy engine gives a straight yes or no
By Sam Carter

TL;DR
- We ran 283 real UK procurement decisions through both an AI agent and MeshQu's policy engine at the same moment, binding every verdict to a signed receipt.
- They almost never agreed — but not because the agent was reckless. The policy engine gave a straight allow or deny; the agent kept reaching for 'needs review' when the evidence was incomplete.
- The fix turned out to be about how the policy is written, not the agent: relaxing one over-strict rule lifted agreement roughly elevenfold.
Abstract
Regulated firms cannot routinely answer how a specific AI-augmented decision was made. We ran 300 public UK Contracts Finder filings through an LLM agent and the same records through a MeshQu policy evaluator at the same moment. Both verdicts, the agent's reasoning, the exact policy snapshot, and a substrate provenance envelope were bound into a single Ed25519-signed receipt anchored to Sigstore Rekor. MeshQu produced 144 ALLOW and 139 DENY; the agent produced 7 ALLOW, 276 REVIEW, and zero DENY. Naive agreement is 7 of 283 — disagreement shaped as non-commitment under incomplete evidence, not over- permissiveness. A counterfactual demoting one rule from critical-DENY to a REVIEW band lifts agreement roughly elevenfold — a finding about policy authoring, not agent capability.
- “MeshQu produced 144 ALLOW and 139 DENY; the agent produced 7 ALLOW, 276 REVIEW, and zero DENY across 283 unique decisions.”
- “What the corpus measures is not 'agent right or wrong'. It is two systems with different verdict spaces examining the same evidence.”
- “MeshQu produces a committed binary verdict; the agent produces a verdict plus a hedge.”
- “Demoting a single rule from critical-by-default DENY to a REVIEW band lifts agreement roughly elevenfold.”
- “This is a finding about policy authoring, not agent capability.”
PDF · 26 pages · 1.5 MB