Software that thinks and gets straight to work.
What we build
A boutique AI-focused software engineering firm. We build AI-driven systems where the answer comes from learning patterns in data (LLMs, classifiers, forecasts), and rules-based systems where the answer follows defined logic (APIs, data pipelines, rule engines).
Where we work
Generative AI, conversational AI, document processing, vision, forecasting, customer analytics, MLOps, and web. Eight specialisations, one discipline.
How we deliver
Small teams. Short cycles. Honest measurement. Every model we ship will reach production fully instrumented, so you can see what it does and where it falls short.
Background: a Hilbert curve; a space-filling fractal that maps 2D into 1D while preserving locality — used in spatial databases, image codecs, and ML embeddings.
AI projects rarely fail at the model. They fail in the systems around it: post-launch ownership, data drift, and continuous monitoring. We engineer for each from day one.Aurum Quanta methodology
Free discovery call, no commitment
To a working pilot on your data
Services across the AI stack
Your code, your models, your cloud
What we build for you.
See service detail →Text, code, images, audio, and video. Generated, reviewed, iterated.
Assistants that hand off to a human when they should.
Forms, invoices, and statements read and routed without retyping.
Object detection and inspection for things that aren't documents.
Demand and price forecasts built from your own data.
Churn, LTV, and next-best-action scores landing in the CRM you already use.
The infrastructure that keeps models working on Tuesdays.
Production websites and apps, handed over with the keys.
Every model we ship will pass through a quality gate.
# eval/check.py: block deploys when quality regresses
def test_summarisation_quality(baseline: float = 0.84):
cases = load_golden_set("summarisation/v3.jsonl")
scores = [rouge_l(c.expected, model.run(c.input)) for c in cases]
median = statistics.median(scores)
assert median >= baseline, f"Regressed: median={median:.3f} (baseline {baseline})"An evaluation gate in CI, wired into every deploy that reaches production.
How we'd scope it.
A mid-sized Australian retailer processes 30,000+ supplier invoices a year, all manually keyed into the ERP.
Pilot: 3 weeks. Production handover: 4–6 weeks.
A government agency or professional services firm holds 5,000+ pages of internal policy and regulation.
Pilot: 3 weeks. Production handover: 6–10 weeks.
A model has been live for eight months.
Audit: 1–2 weeks. Implementation: 4–10 weeks.
Illustrative scoping, not delivered case studies. Real ones go up here once first engagements wrap.
Three beats from first email to hand-over.
Short cycles. Honest measurement. You see working software on your own data before committing to anything bigger.
Read the full four-step process →- 01
Discover
A 30-minute call and a written scope. Fixed-fee wherever the work is bounded enough to be quoted that way, and an honest read on whether it's a fit.
- 02
Pilot
The smallest useful version of the system, working on your data inside two to four weeks.
- 03
Hand over
The repositories, the trained weights, and the runbooks. Ongoing support is available if you want it.
Pick a metric that reflects what you actually want
Most ML projects that fail didn't fail at modelling. They optimised the wrong metric and succeeded, by that metric, all the way to an unusable system.
See if we're a fit.
A 30-minute discovery call. We learn what you're trying to do and tell you whether we're the right shop for it.
Book a discovery call →