Presentation: Massive Scale Anomaly Detection Framework
This presentation is now available to view on InfoQ.com
Watch video with transcriptAbstract
Early detection of abnormal events can be critical for many business applications, however there are numerous challenges when implementing real-time anomaly models at scale. Server failure, developer error and malicious activities are very different scenarios with different engineering requirements. Moreover, most analytical models have been traditionally designed for the batch processing paradigm and usually cannot be easily adapted to unbounded datasets and real-time latencies.
At PayPal, we must be able to analyze billions of events every day in real-time across a wide range of services, devices and locations. In a collaboration between our Platform engineering team and data science teams, we have built a generic framework for developing robust and scalable anomaly detection streaming applications, focusing on flexibility to support different types of statistical and machine learning models. Inspired by the design of scikit-learn and Spark MLlib, we have designed a simple pipeline-based API on top of Spark Structured Streaming, that captures common patterns of the anomaly detection domain.
At the base of the framework, we took advantage of Spark Structured Streaming fast and scalable execution engine together with stream-oriented building blocks to allow easy extension to new production grade models. We found real-time anomaly detection to provide powerful capabilities in many different fields, internally we use the framework for a variety of use cases ranging from fraud prevention, operations and even security.