A mid-sized Australian retailer processes 30,000+ supplier invoices a year, all manually keyed into the ERP. The finance team spends sixty-plus hours a week on retyping, and ledger reconciliation surfaces errors that arrived through the keying process.
The invoices are semi-structured, the volume justifies the investment, and the errors are recurring and well-understood. This is the kind of problem IDP was built for. It doesn't need novel model research.
- 01
Pull a representative sample of 200 invoices from the last 90 days. Anything below 300 DPI or rotated incorrectly gets rejected. If the scan process is the bottleneck, that's where the first week of work goes.
- 02
Run a one-day workshop with finance, AP, and procurement to write down what each field means in plain language. "Invoice number" is whichever value flows into column X of the ERP. If a supplier prints two of them, the workshop decides which one wins and writes the decision down before any modelling starts.
- 03
Document the target schema in the ERP and run one record end-to-end (extracted → ERP write → reconciled) before training anything. About half of failed IDP projects fail in the integration layer, not the extraction layer, so it's the first place to look.
- 04
Set a per-field confidence threshold (the model's self-reported confidence in each extraction, distinct from the per-field accuracy measured against ground truth in the success criterion below); the default starting point is 0.85, calibrated downstream as the reviewer queue settles. Fields above the threshold pass straight through. Fields below it queue for human review until the model has learned enough to clear them on its own.
- 05
Write the success criterion in the SOW before any code: ≥90% straight-through accuracy on a 200-document holdout, measured by week 3. Miss the target and the production phase doesn't happen.
≥90% per-field accuracy on a 200-document holdout, with low-confidence routing in place. Measured at week 3.
Pilot: 3 weeks. Production handover: 4–6 weeks.
Not every document category clears 90%. Handwritten remittance advice, fax-quality scans, and multi-page composite documents tend to land below threshold and stay there. They need human-in-the-loop review indefinitely. We'll flag those categories by week 2, not at the end.