The case for fixed-fee pilots in ML consulting
Time and materials pricing in machine learning consulting is an incentive alignment problem dressed as a pricing model.
The mechanics, for anyone who hasn't had to negotiate one: the consultancy charges the client a rate per engineer per day. Work continues for as long as the client wants it to continue. The consultancy bills for the time. There is no defined endpoint, no defined deliverable that ends the engagement, and no financial consequence for the consultancy if what they build doesn't work. This arrangement is standard across the industry. It is also, for any engagement with genuine uncertainty, almost exactly the wrong way to pay for the work.
The problem is not that time and materials is intrinsically wrong. For production maintenance, for ongoing engineering augmentation, for well-understood tasks at predictable scale, it's fine. The problem is that machine learning pilots are not well-understood tasks at predictable scale. A pilot exists precisely to determine whether a thing will work. The entire point is exploration. Time and materials applied to exploration gives the exploring party a direct financial interest in continuing to explore, regardless of how the exploration is going.
Consider the dynamics at month three of a struggling pilot. The data turned out to be messier than expected. The model is stuck at sixty-eight percent accuracy when the business requires eighty-five. A consultancy billing by the hour has two choices. It can recommend stopping, which loses them the next six months of billings, or it can recommend another six weeks of effort to try a different approach, which keeps the meter running. Most consultancies, most people, faced with this incentive, will tell you the second story with complete sincerity. They may even believe it. But the incentive is real, and it tilts the whole structure toward persistence past the point where persistence makes sense.
How fixed-fee changes the dynamic
Fixed-fee pilots remove this incentive. The consultancy is paid a defined amount to reach a defined outcome by a defined date. If they miss, they either eat the cost or they have to disclose the miss and ask for more money with a good explanation. Both paths are uncomfortable; both paths force honesty earlier. The vendor has a financial interest in scoping honestly at the front end, because they're betting their margin on being able to deliver. The client knows the maximum downside before signing. If the pilot doesn't hit its target, both parties walk away without the weight of a sunk-cost argument.
The standard objection
Here is the standard objection, and the right response. 'But machine learning is exploratory. You can't fix-fee exploration.' The premise is right. The conclusion is not. What you cannot fix-fee is a production build that will take twelve to eighteen months and has to accommodate unforeseen integration work. What you absolutely can fix-fee is the pilot: the first two to four weeks, scoped narrowly, with a written success criterion agreed up front. The pilot is by definition the part you can bound. The production build, which follows only if the pilot works, can be time and materials or milestone-based. Fix-fee the pilot; negotiate the production phase separately.
A scopable fixed-fee pilot has four properties. The first is one dataset, chosen before the engagement starts. The second is one problem, defined tightly enough that a single metric answers the question. The third is one success criterion, written into the statement of work as something like 'extraction accuracy of ninety percent on a held-out sample of two hundred documents', not 'the system works'. The fourth is one timebox, usually two to four weeks, with the understanding that if the target is not hit by the end date, the pilot has failed and the next phase does not happen. All four of these are negotiable before the contract is signed. None of them is negotiable after.
One consequence is worth flagging for buyers. A consultancy that insists on time and materials for a pilot is telling you something about its confidence in its own capability. If a vendor has done this kind of work before, they can estimate the effort. If they've done it many times, they can estimate it accurately enough to bear the risk. A refusal to fix-fee a small, narrowly-scoped pilot is usually a refusal to put money behind judgement. That's the most important information you will get in the sales conversation. Note it, and let it weigh accordingly.
# sow/pilot.yaml: the four properties every fixed-fee pilot must specify
dataset: "supplier-invoices/2025-q1.parquet" # frozen sample, named in writing
problem: "extract { vendor, total, due_date } per invoice"
success: { metric: "field_accuracy", threshold: 0.90, holdout: 200 }
timebox: { start: "2026-05-04", end: "2026-05-31" }
on_failure: "no production phase, no further cost"If any line is missing or vague, it's not a fixed-fee pilot. It's a time-and-materials engagement with a deadline.