The paper introduces TingIS, an innovative system for real-time detection and mitigation of technical anomalies in large-scale cloud-native services, which is essential to prevent financial losses and maintain user trust. TingIS employs a multi-stage event linking engine that combines efficient indexing techniques with Large Language Models (LLMs) to derive actionable insights from noisy customer incident data. The system includes a cascaded routing mechanism for accurate business attribution and a multi-dimensional noise reduction pipeline that leverages domain knowledge and statistical patterns. In production, TingIS processes over 2,000 messages per minute, achieving a 3.5-minute P90 alert latency and a 95% discovery rate for high-priority incidents, outperforming baseline methods in routing accuracy and clustering quality according to benchmarks created from real-world data.
TingIS: A Novel System for Real-Time Risk Event Discovery in Cloud Services
More Articles From This Day
Google to Invest Up to $40 Billion in Anthropic
Google has announced an initial investment of $10 billion in Anthropic PBC, valuing the company at $350 billion. The tech giant is also considering an additional investment of $30 billion in the future. This deal was discussed by Bloomberg's Shirin Ghaffary with Ed Ludlow on 'Bloomberg Tech.'
