From first email to production, in four steps.
Discovery call
Scoping doc
Pilot build
Production
Discovery call
Scoping doc
Pilot build
Production
A 30-minute conversation. We learn what you're trying to do and tell you whether this is something we should be the ones to take on.
A short written proposal covering the problem, the approach, the deliverables, the price, and the timeline. Fixed-fee wherever the work is bounded enough to be quoted that way.
The smallest useful version of the system, working end-to-end on your data. You see results from real inputs before deciding whether to commit to the production build.
We harden the pilot, document everything a successor engineer would need, and hand the system over. You get the repositories, the trained weights, and the runbooks. Ongoing support is available if you want it; not assumed.
The technical depth behind everything we build.
Machine Learning and Predictive Modelling
Gradient-boosted models, random forests, logistic regression, time-series forecasting. We pick the approach based on the data and the decision the model has to support.
Deep Learning and Neural Networks
Transformers, CNNs, and sequence models for problems that need them: document layout understanding, demand forecasting under heavy seasonality, adaptive question selection in education products.
Natural Language Processing
Text classification, entity extraction, semantic search, summarisation. Fine-tuned models when the cost-per-call justifies it; prompt-engineered pipelines when it doesn't.
Computer Vision and Document AI
OCR, layout-aware extraction, image classification, object detection. This is the backbone of the IDP service and of most workflows that start with a scanned page or a camera feed.
Data Engineering and Pipelines
Ingestion, cleaning, transformation, orchestration. Bad data quietly caps the accuracy of every model downstream of it, and no amount of modelling effort recovers what's been lost upstream. We treat data engineering as part of every ML build, not a separate workstream.
MLOps and Model Monitoring
Automated retraining, drift detection, A/B testing, rollback. Models in production need the same operational rigour as any other deployed service, and they have a few failure modes that conventional DevOps tooling doesn't cover.
Generative AI and LLMs
Retrieval-augmented generation, fine-tuning, prompt engineering, agent workflows. We use them where they add value over a simpler approach. A lot of the time the simpler approach wins, and we'll tell you when that's the case.
Analytics and Dashboards
Interactive reporting, KPI tracking, scenario simulation. The interface layer that lets your team act on what the models are finding, between quarterly slides.
The tools we reach for, and when.
Modelling
Deep-learning models for the problems that need them: vision, sequence modelling, custom fine-tunes.
Linear baselines and classical classifiers. Usually the first model we fit on any new problem, even when it's clearly going to be replaced.
Gradient-boosted trees on tabular data. They quietly beat deep learning on tabular problems more often than the conference circuit suggests.
Pre-trained models, tokenizers, and fine-tuning pipelines for most of the text and vision work.
LLMs
Frontier model access. We pick per task and keep the prompt portable across providers, partly to avoid lock-in and partly because the leaderboard keeps shifting.
Used selectively. When the framework starts adding more complexity than it saves, we write the retrieval pipeline from scratch instead.
Vector search inside Postgres, so there's one database to operate instead of two. Goes in when the corpus size lets us get away with it; we'll move to a dedicated vector DB once it doesn't.
Data and pipelines
Tabular data wrangling. We reach for Polars when the dataset gets large enough that pandas runtime starts to be a project of its own.
Analytical queries over Parquet files without standing up a full warehouse. The pilot-stage workhorse on a lot of forecasting and analytics engagements.
The default operational database. Durable, well-understood, and already running in most modern stacks we'd integrate into.
Transformations as version-controlled SQL. We pull it in when there's a real warehouse to maintain.
Production and MLOps
Experiment tracking and model registry. The default choice, unless you already run Weights & Biases. We won't make you re-platform onto a tool we'd find more familiar.
Python services for inference. Type-safe, async, and quick enough for most real-time workloads we encounter.
Containerised everything. The handover artefact for any model or service we ship.
CI for evaluation gates, drift checks, and deployment. This is where the eval-set test from the homepage actually runs in production.
Cloud and platform
Whichever cloud you already pay for. Engagements run in your environment, not in one of ours.
Infrastructure as code, on engagements large enough for it to actually pay back the upfront investment.
Marketing and web work. Zero-config deploys, an edge runtime that mostly stays out of the way, sensible defaults.
Web
The default for marketing sites and product front-ends. Server components for performance, type-safe routing, and a reasonable migration path to whatever React standardises on next.
Always. The runtime cost of untyped JavaScript on a real production codebase is hard to overstate.
Utility-first styling. Faster than writing component CSS for everything we ship at this scale, and easier to onboard a new engineer onto.
Schema validation at every system boundary. The contact form on this site uses it; so does most of the API surface in our other web work.
Pick your constraints. See the stack we would reach for.
Building from scratch, no users yet. Optimise for ops simplicity.
Hosted API (Anthropic Claude or OpenAI)
At low scale and small team, the cost of running infra exceeds the cost of API tokens. Use the API, instrument it, and revisit the build-vs-buy decision once volume changes the maths.
Serverless (Cloud Run or Lambda)
Small team plus modest scale means ops time is the scarcest resource. Serverless removes provisioning, autoscaling, and patching from your plate.
Managed Postgres (Supabase / Neon / RDS) + pgvector
Managed Postgres covers transactional, full-text, and (with pgvector) embedding workloads up to several million rows without specialist infra. One database, multiple jobs.
Platform-native logs + Sentry
Solo teams cannot maintain a full observability stack. Cloud-platform logs cover the basics, and Sentry catches the errors that actually matter, for almost no setup cost.
Pick latency, sensitivity, scale, and team size. The matrix outputs the four-layer stack we would reach for first: inference, runtime, data, and observability. The rules are honest defaults: a sensible starting point, not a prescription.
Note · this is a simplified demo
In a real engagement we probe further: residency requirements, existing vendor commitments, on-call coverage, cost ceilings, the team is comfortable with which paradigm, what the company already runs in production, and whether the project warrants a clean-room build or has to slot into a legacy estate. The matrix above is the first pass; the conversation is where the real architecture decision lives.
Models that are auditable, explainable, and fair.
Explainability
Every model prediction is accompanied by SHAP values, global feature importance, and a plain-language explanation. A regulator's question of 'why this decision' can be answered without recourse to the data scientist who originally trained the model.
Audit trails
Every data transformation, model version, and prediction is logged with a stable identifier and retained for the period your jurisdiction requires. Audit-ready logging is built into the system from the first commit, since retrofitting it is operationally expensive and frequently incomplete.
Bias testing
Systematic fairness checks across protected attributes are performed before a model is deployed and repeated on a defined cadence in production. The tests are integrated into the CI pipeline and block any deployment that regresses against the established baseline.
Data sovereignty
Client data remains within your environment and jurisdiction throughout the engagement. It is not copied onto Aurum Quanta infrastructure, transferred across borders without your explicit written consent, or used to train systems beyond your own.