Skip to main content

Research PreviewMRP-2026-02v1.018 May 2026Public Distribution26 pages1.5 MB

When AI hedges and policy commits

Why an AI agent reaches for 'needs review' where a policy engine gives a straight yes or no

By Sam Carter

Cover — When AI hedges and policy commits (MRP-2026-02)

TL;DR

  • We ran 283 real UK procurement decisions through both an AI agent and MeshQu's policy engine at the same moment, binding every verdict to a signed receipt.
  • They almost never agreed — but not because the agent was reckless. The policy engine gave a straight allow or deny; the agent kept reaching for 'needs review' when the evidence was incomplete.
  • The fix turned out to be about how the policy is written, not the agent: relaxing one over-strict rule lifted agreement roughly elevenfold.

Abstract

Regulated firms cannot routinely answer how a specific AI-augmented decision was made. We ran 300 public UK Contracts Finder filings through an LLM agent and the same records through a MeshQu policy evaluator at the same moment. Both verdicts, the agent's reasoning, the exact policy snapshot, and a substrate provenance envelope were bound into a single Ed25519-signed receipt anchored to Sigstore Rekor. MeshQu produced 144 ALLOW and 139 DENY; the agent produced 7 ALLOW, 276 REVIEW, and zero DENY. Naive agreement is 7 of 283 — disagreement shaped as non-commitment under incomplete evidence, not over- permissiveness. A counterfactual demoting one rule from critical-DENY to a REVIEW band lifts agreement roughly elevenfold — a finding about policy authoring, not agent capability.

Download PDF

PDF · 26 pages · 1.5 MB

All research