From Alerts to Explanations: LLMs as an Interpretation Layer for Production Machine Learning Systems
Keywords:
LLM-Powered Observability, ML Platform Operations, Retrieval-Augmented Diagnosis, Root Cause Analysis, Confidence Calibration, Evaluation Methodology.Abstract
Production machine learning systems emit rich telemetry, but incident response often remains limited by human interpretation rather than signal availability. Existing observability tools detect many symptomatic deviations, but they rarely connect signals across stack layers into evidence-backed root-cause hypotheses. This paper proposes an LLM-mediated interpretation layer for production ML operations. The layer assembles incident context from telemetry, lineage, code and configuration changes, historical incidents, and runbooks; generates ranked diagnostic hypotheses; and presents evidence-backed recommendations to human operators. We argue that ML systems require an evidence model that differs materially from generic AIOps because failures span feature pipelines, training dynamics, serving behavior, and experiment systems. We propose evaluation methodology including metrics, ground-truth construction, and calibration requirements. This paper presents a reference architecture and research agenda, not an implemented system. We position the interpretation layer as a human-in-the-loop decision-support component, not an autonomous remediation system.




