Process

From first email to production, in four steps.

Week 0

Discovery call

Week 1

Scoping doc

Weeks 2 to 4

Pilot build

Week 5 onwards

Production

Week 0

Discovery call

Week 1

Scoping doc

Weeks 2 to 4

Pilot build

Week 5 onwards

Production

A 30-minute conversation. We learn what you're trying to do and tell you whether this is something we should be the ones to take on.

A short written proposal covering the problem, the approach, the deliverables, the price, and the timeline. Fixed-fee wherever the work is bounded enough to be quoted that way.

The smallest useful version of the system, working end-to-end on your data. You see results from real inputs before deciding whether to commit to the production build.

We harden the pilot, document everything a successor engineer would need, and hand the system over. You get the repositories, the trained weights, and the runbooks. Ongoing support is available if you want it; not assumed.

Capabilities

The technical depth behind everything we build.

Machine Learning and Predictive Modelling

Gradient-boosted models, random forests, logistic regression, time-series forecasting. We pick the approach based on the data and the decision the model has to support.

Deep Learning and Neural Networks

Transformers, CNNs, and sequence models for problems that need them: document layout understanding, demand forecasting under heavy seasonality, adaptive question selection in education products.

Natural Language Processing

Text classification, entity extraction, semantic search, summarisation. Fine-tuned models when the cost-per-call justifies it; prompt-engineered pipelines when it doesn't.

Computer Vision and Document AI

OCR, layout-aware extraction, image classification, object detection. This is the backbone of the IDP service and of most workflows that start with a scanned page or a camera feed.

Data Engineering and Pipelines

Ingestion, cleaning, transformation, orchestration. Bad data quietly caps the accuracy of every model downstream of it, and no amount of modelling effort recovers what's been lost upstream. We treat data engineering as part of every ML build, not a separate workstream.

MLOps and Model Monitoring

Automated retraining, drift detection, A/B testing, rollback. Models in production need the same operational rigour as any other deployed service, and they have a few failure modes that conventional DevOps tooling doesn't cover.

Generative AI and LLMs

Retrieval-augmented generation, fine-tuning, prompt engineering, agent workflows. We use them where they add value over a simpler approach. A lot of the time the simpler approach wins, and we'll tell you when that's the case.

Analytics and Dashboards

Interactive reporting, KPI tracking, scenario simulation. The interface layer that lets your team act on what the models are finding, between quarterly slides.

Stack

The tools we reach for, and when.

Modelling

PyTorch

Deep-learning models for the problems that need them: vision, sequence modelling, custom fine-tunes.

scikit-learn

Linear baselines and classical classifiers. Usually the first model we fit on any new problem, even when it's clearly going to be replaced.

XGBoost / LightGBM

Gradient-boosted trees on tabular data. They quietly beat deep learning on tabular problems more often than the conference circuit suggests.

Hugging Face Transformers

Pre-trained models, tokenizers, and fine-tuning pipelines for most of the text and vision work.

LLMs

OpenAI / Anthropic / Google APIs

Frontier model access. We pick per task and keep the prompt portable across providers, partly to avoid lock-in and partly because the leaderboard keeps shifting.

LangChain / LlamaIndex

Used selectively. When the framework starts adding more complexity than it saves, we write the retrieval pipeline from scratch instead.

pgvector

Vector search inside Postgres, so there's one database to operate instead of two. Goes in when the corpus size lets us get away with it; we'll move to a dedicated vector DB once it doesn't.

Data and pipelines

pandas / Polars

Tabular data wrangling. We reach for Polars when the dataset gets large enough that pandas runtime starts to be a project of its own.

DuckDB

Analytical queries over Parquet files without standing up a full warehouse. The pilot-stage workhorse on a lot of forecasting and analytics engagements.

PostgreSQL

The default operational database. Durable, well-understood, and already running in most modern stacks we'd integrate into.

dbt

Transformations as version-controlled SQL. We pull it in when there's a real warehouse to maintain.

Production and MLOps

MLflow

Experiment tracking and model registry. The default choice, unless you already run Weights & Biases. We won't make you re-platform onto a tool we'd find more familiar.

FastAPI

Python services for inference. Type-safe, async, and quick enough for most real-time workloads we encounter.

Docker

Containerised everything. The handover artefact for any model or service we ship.

GitHub Actions

CI for evaluation gates, drift checks, and deployment. This is where the eval-set test from the homepage actually runs in production.

Cloud and platform

AWS / GCP / Azure

Whichever cloud you already pay for. Engagements run in your environment, not in one of ours.

Terraform

Infrastructure as code, on engagements large enough for it to actually pay back the upfront investment.

Vercel

Marketing and web work. Zero-config deploys, an edge runtime that mostly stays out of the way, sensible defaults.

Web

Next.js (App Router)

The default for marketing sites and product front-ends. Server components for performance, type-safe routing, and a reasonable migration path to whatever React standardises on next.

TypeScript

Always. The runtime cost of untyped JavaScript on a real production codebase is hard to overstate.

Tailwind CSS

Utility-first styling. Faster than writing component CSS for everything we ship at this scale, and easier to onboard a new engineer onto.

Zod

Schema validation at every system boundary. The contact form on this site uses it; so does most of the API surface in our other web work.

Architecture matrix

Pick your constraints. See the stack we would reach for.

Start from an example, then tweak

Building from scratch, no users yet. Optimise for ops simplicity.

Problem characteristics · click to change

Latency target

Data sensitivity

Scale

Team size

Recommended stack · what we would reach for first

Inference

Hosted API (Anthropic Claude or OpenAI)

At low scale and small team, the cost of running infra exceeds the cost of API tokens. Use the API, instrument it, and revisit the build-vs-buy decision once volume changes the maths.

Runtime

Serverless (Cloud Run or Lambda)

Small team plus modest scale means ops time is the scarcest resource. Serverless removes provisioning, autoscaling, and patching from your plate.

Data

Managed Postgres (Supabase / Neon / RDS) + pgvector

Managed Postgres covers transactional, full-text, and (with pgvector) embedding workloads up to several million rows without specialist infra. One database, multiple jobs.

Observability

Platform-native logs + Sentry

Solo teams cannot maintain a full observability stack. Cloud-platform logs cover the basics, and Sentry catches the errors that actually matter, for almost no setup cost.

Pick latency, sensitivity, scale, and team size. The matrix outputs the four-layer stack we would reach for first: inference, runtime, data, and observability. The rules are honest defaults: a sensible starting point, not a prescription.

Note · this is a simplified demo

In a real engagement we probe further: residency requirements, existing vendor commitments, on-call coverage, cost ceilings, the team is comfortable with which paradigm, what the company already runs in production, and whether the project warrants a clean-room build or has to slot into a legacy estate. The matrix above is the first pass; the conversation is where the real architecture decision lives.

Responsible AI

Models that are auditable, explainable, and fair.

Explainability

Every model prediction is accompanied by SHAP values, global feature importance, and a plain-language explanation. A regulator's question of 'why this decision' can be answered without recourse to the data scientist who originally trained the model.

Audit trails

Every data transformation, model version, and prediction is logged with a stable identifier and retained for the period your jurisdiction requires. Audit-ready logging is built into the system from the first commit, since retrofitting it is operationally expensive and frequently incomplete.

Bias testing

Systematic fairness checks across protected attributes are performed before a model is deployed and repeated on a defined cadence in production. The tests are integrated into the CI pipeline and block any deployment that regresses against the established baseline.

Data sovereignty

Client data remains within your environment and jurisdiction throughout the engagement. It is not copied onto Aurum Quanta infrastructure, transferred across borders without your explicit written consent, or used to train systems beyond your own.