What to Define Before Any AI Pilot Starts

June 15, 2026

Why Most AI Pilots Fail to Show Results

The danger isn't running a bad pilot. The danger is funding a platform, expanding licenses, hiring around it, and embedding it into operations before anyone has proven the business outcome exists. That's how a 90-day pilot becomes a multi-year cost center nobody can justify canceling.

Most organizations get back a dashboard full of activity metrics with no connection to revenue, cost, or risk. If the buyer doesn't define success before the vendor arrives, the vendor defines it and their definition will be whatever they can already demonstrate. Utilization rates. Prompt volume. Tasks completed. Metrics that are easy to produce and impossible to trace to a business outcome. The pilot passes. Nothing changes.

What needs to exist before any vendor enters the room is a governance framework the buyer wrote. Seven questions. All of them answered internally before the first meeting gets scheduled.

  1. Name the Business Problem First

Not "we want to use AI." A specific constraint the business wants removed, order processing that takes too long, support teams drowning in ticket volume, security analysts buried in alerts, claims reviews that run for weeks. The problem comes first. The technology follows. If that problem can't be described in one sentence without mentioning AI, the pilot isn't ready to start.

  1. Pick the Metric That Proves the Problem Exists

Whatever the business problem is, a number is attached to it - average resolution time, cost per transaction, incident response time, revenue per rep, audit preparation hours. That number is the metric. If the business doesn't already track one, that's worth knowing before committing budget. It may mean the measurement infrastructure isn't in place yet.

  1. Capture the Baseline Before the Vendor Arrives

What does that metric look like today, before any new technology touches it? Most organizations find they don't have a clean baseline when they go looking. Capture it now, before the vendor's tools land and the numbers get harder to untangle. Without a baseline, the end-of-pilot dashboard can only confirm the pilot happened — there's no way to know whether anything improved.

  1. Connect the Metric to Business Value

Ticket resolution time drops 25%. Then what? Does it reduce labor cost, avoid a hiring decision, improve SLA compliance, increase throughput? Without connecting the metric to an organizational outcome, the value never materializes regardless of what the dashboard shows. Before the pilot starts, answer two questions: what business outcome does this metric impact, and what is the dollar value of hitting the success threshold? A pilot that can't answer those questions isn't ready to be funded.

  1. Set the Number That Justifies the Investment

A specific figure — not a range, not "meaningful improvement" — that the board would accept as proof the investment was worth making. That call gets made before the first vendor meeting. Once the vendor is in the room, every number they suggest will be optimized for what they can demonstrate, not what the business needs to prove.

  1. Name the Person Accountable for the Outcome

One executive whose name is attached to the result twelve months from now above the project manager running day-to-day and the implementation lead managing the vendor. This person owns the metric, owns the baseline, and owns the call on whether the pilot worked. Without a named owner, accountability spreads across a steering committee and quietly disappears once the dashboard comes back green. Steering committees are where ownership goes to die.

  1. Define What Stop Looks Like

What happens if the threshold isn't met? Kill it, expand it, or redesign it and decide in advance what data points to each outcome. Most organizations skip this step not because they forgot, but because nobody wants to define failure when everyone already wants the project to move forward. Executives want the initiative. Teams want the budget. Sponsors want the win. Defining failure creates political risk, so it gets deferred until there's no clean way out. Without kill criteria defined before the pilot starts, a pilot doesn't end. It becomes a permanent line item.

What This Looks Like Before the Vendor Shortlist Gets Built

A support organization wants to cut ticket resolution time. The baseline is 42 minutes. Success is a 25% reduction down to 31 minutes within 90 days. Each minute recovered per ticket eliminates support hours per month, delays the next headcount request, and improves SLA compliance. The Director of IT Operations owns the outcome. If improvement is under 10% at the 60-day mark, the pilot gets reviewed for redesign before the full timeline runs out. All of this exists before the RFP goes out.

Why This Keeps Getting Skipped

Nobody wants to define failure upfront. When the executive already wants the project, the team already wants the budget, and the sponsor already wants the initiative, writing down what cancellation looks like creates political risk. So it gets deferred. The vendor fills the vacuum. Six months later the board asks what changed and nobody has an answer that maps to revenue, cost, or risk.

The Checklist

Before any AI pilot starts, these need to exist in writing:

The business problem, in one sentence, without mentioning AI
The metric that proves the problem exists
The baseline: what that metric is today
The business value: what outcome the metric impacts and what hitting the threshold is worth in dollars
The success threshold: the specific number that justifies the investment
The owner: one named executive accountable for the outcome
Kill criteria: what happens if the threshold isn't met

If any of these are missing when the first vendor meeting is scheduled, the pilot is not ready to start. The checklist is what makes the dashboard at the end mean something without it, all the dashboard can tell you is that the pilot happened.

If a board-driven AI initiative is what's in front of you right now, this is where to start - it covers what's actually at stake when the board is asking for proof and what changes when someone on your side owns the evaluation criteria from the beginning.