6 min

Where ML teams sit decides what they ship

A company has an ML problem and three options for who owns it. The Research team has the strongest credentials and the most papers cited. The Engineering team has the best operational discipline. The Product team has the closest understanding of what the customer is actually doing.

The choice of who owns the work decides what the work looks like. The Research team will produce a more sophisticated model that lives in a notebook for nine months. The Engineering team will ship a less sophisticated model that survives in production for three years. The Product team will ship something simpler than either and revise it weekly based on feedback.

There is no objectively right answer to this organisational choice, but there is one rule that consistently predicts success: the team that owns the model has to also own the consequences of the model in production. When the model breaks, when accuracy regresses, when the business stakeholder has questions, the team that built it has to be the team responding. If those are different teams, the gap between them is where projects fail.

Three patterns to recognise

Three patterns to recognise.

Research-led ML. The model is sophisticated. The papers are real. The notebook is beautiful. The handoff to engineering is where the system either ships or doesn't, and most of the time, doesn't. The mitigation is that research and engineering work on the same model from week one, with deployment tests in week one as well. The notebook isn't the product; the production service is. If the production engineer can't run the notebook, the project hasn't started yet.

Engineering-led ML. The model is functional and lives in production. The model has been tuned to the engineering team's intuitions, not the user's needs. There is no eval set that the user-facing team would recognise as fair. The mitigation is to write the eval set with the user-facing team, not with the modelling team. The metric the engineers chase is the metric the customers feel.

Product-led ML. The model is simple, lives in production, ships fast, and revises weekly. The product team has visibility into both customer behaviour and engineering reality. The risk is that the model never grows up: it stays at the demo-quality level forever, because there's no one with deep modelling expertise pushing for the harder cases. The mitigation is to hire one ML specialist into the product team rather than running them as a separate function.

A useful diagnostic question is to ask, before the project starts, who gets paged at 3am when the model fails. If the answer is 'we'll figure that out at handover', the project hasn't been scoped. If the answer is 'the platform team', and the platform team didn't build the model, you've already created the gap that will produce the future incident. The right answer is that the team that ships the model is the team that gets paged, and they are also the team that decides what 'good' looks like.

A second diagnostic: who writes the eval set. If the eval set is written by the same team that trains the model, you've put the test and the test-taker in the same room. If the eval set is written by a different team that doesn't understand the modelling constraints, you'll get an eval set that's impossible to satisfy. The right structure is that the user-facing team and the modelling team write the eval set together, in the same week, with the same coffee.

A third diagnostic: who owns the deprecation decision. Models reach the end of useful life. Sometimes the world has shifted; sometimes a simpler approach has caught up; sometimes the maintenance cost has exceeded the value. Whoever can call that decision and turn the model off has the real ownership. If nobody can call it, the model lives forever, occupying compute and attention that could go elsewhere.

Organisational structure determines technical outcomes more than people usually admit. The team that ships ML projects reliably is the team that owns the eval set, ships the model, gets paged when it breaks, answers to the business stakeholder when accuracy regresses, and has authority to turn the model off. The team that does fewer of those things ships less reliable models, regardless of the technical sophistication. Which means the most leveraged decision in many ML projects is the org chart, not the architecture.

← Back to Insights