Vision systems fail on the lighting, not the model
A manufacturer trains a defect-detection vision model on 5,000 sample images of widgets coming off the line. The model achieves 98 percent precision on a held-out set. The model is deployed. Two weeks later it is flagging perfectly fine widgets at a 30 percent rate. The team retrains. Two weeks later the rate is back. Eventually someone walks the floor and notices that the new fluorescent lights installed during the line refurbishment cast a slightly bluer light than the old ones. The model was correctly classifying widgets-as-the-old-light-rendered-them.
The model wasn't wrong. The training distribution was. This is the standard failure mode of computer vision systems in production, and it is almost always misdiagnosed as a model problem.
The pilot is run in conditions that the production environment doesn't reproduce. The training images are taken on a sunny weekend with a researcher's phone. Production runs at 6am on a winter morning under emergency lighting. The training images are framed neatly. Production cameras have been bumped, mis-aimed, or smudged. The training set has clean backgrounds. Production is full of the bag the operator left next to the conveyor.
The fix is structural, not algorithmic. Engineering vision systems for production means engineering for the conditions they will actually encounter, not for the model that will see them.
Three failure modes to anticipate
First, lighting drift. Indoor lighting changes with bulb type, bulb age, time of day (sun through windows), seasonal angle, and after any building work. Outdoor lighting changes by hour, weather, and season. A model trained in summer will fail differently in winter, and the failures are confidently wrong, not obviously wrong. Mitigation: collect training data across as much lighting variation as exists in production. If you're going to deploy somewhere with a glass facade, make sure the training data covers a sunny morning, a cloudy noon, and an after-dark with the security lights on. Synthetic colour jitter helps but does not substitute.
Second, hardware drift. Cameras get bumped. Lenses gather dust. CMOS sensors gain hot pixels. Auto-focus mechanisms fail. The image you got at install is not the image you are getting six months later. Mitigation: capture an image of a known reference target every shift (a printed colour chart, or a fixed object in frame), monitor pixel-level statistics on it, and alert when it drifts. This is the equivalent of feature drift monitoring for tabular data, and it is just as important.
Third, scene drift. The conveyor moved, the box stack got taller, a new station was added, an operator changed the standard placement. The thing the camera sees is different now, even though nothing about the model changed. Mitigation: track the bounding box of the relevant region over time, alert on significant shifts, and treat sustained shifts as a retraining trigger.
A fourth, smaller, but underrated failure: motion blur and focus. The model trained on still photos may behave very differently on moving objects under autofocus that the photographer didn't have to think about. If your production system images things in motion, train on motion. If your camera autofocuses, train on slightly off-focus images too.
The boring discipline that prevents most of this is what we call the production-environment dry run. Before training the model, install the camera in production conditions and capture 24 hours of representative data: different times, different operators, different conditions. That data is the basis of the eval set. The training set is built to cover that distribution, not the distribution of carefully-staged shots from the dev kitchen.
The strong claim
Put strongly: most vision projects' production accuracy is not bounded by the model architecture or the training compute. It is bounded by the variance of the deployment environment that wasn't in the training data. A 92 percent model trained on real-world variation will outperform a 99 percent model trained on the perfect-day distribution every time, and it will keep outperforming when the lights change.
# vision/augment.py: augment for the conditions you will actually meet
train_transform = A.Compose([
A.RandomBrightnessContrast(brightness_limit=0.35, contrast_limit=0.25, p=0.7),
A.HueSaturationValue(hue_shift_limit=12, sat_shift_limit=20, val_shift_limit=15, p=0.5),
A.RandomShadow(p=0.3), # bag on the conveyor blocking light
A.MotionBlur(blur_limit=7, p=0.2), # part moving under fixed camera
A.GaussNoise(var_limit=(8, 30), p=0.4),# CMOS sensor noise after a year
A.ImageCompression(quality_lower=70, p=0.3), # production codec recompression
])Augmentation parameters chosen by walking the production floor, not by copying a tutorial. The 92-percent model that survives lighting changes wins.