Problem: Quarterly evaluation cycle meant the model could underperform for up to 89 days undetected. Post-mortem analysis confirmed two historical scoring periods with elevated default rates correlated with feature drift that would have been caught within a week by continuous monitoring.
Solution: Daily PSI computation per input feature against training distribution. PSI > 0.2 on any high-SHAP feature triggered a Slack alert. PSI > 0.25 on 3+ features triggered automatic retraining on the prior 6 months of data with the same hyperparameter config. New model promoted only if Gini exceeded current production model by >0.01.
Technology: Python · Evidently · MLflow · Airflow · Postgres · Slack
Optimisation pattern: quarterly-manual-evaluation-to-daily-psi-drift-monitoring-with-auto-retrain
Outcomes:
2 below-SLA periods prevented. Detection to new model in production: 38 hours average. Quarterly evaluation retained as governance check. Regulatory review cited drift monitoring as positive model governance practice.