You are viewing content from a past/completed QCon -

Presentation: Massive Scale Anomaly Detection Framework

Track: Predictive Architectures in the Real World

Location: Cyril Magnin I

Duration: 11:40am - 12:20pm

Day of week:

Slides: Download Slides

This presentation is now available to view on

Watch video with transcript


Early detection of abnormal events can be critical for many business applications, however there are numerous challenges when implementing real-time anomaly models at scale. Server failure, developer error and malicious activities are very different scenarios with different engineering requirements. Moreover, most analytical models have been traditionally designed for the batch processing paradigm and usually cannot be easily adapted to unbounded datasets and real-time latencies.


At PayPal, we must be able to analyze billions of events every day in real-time across a wide range of services, devices and locations. In a collaboration between our Platform engineering team and data science teams, we have built a generic framework for developing robust and scalable anomaly detection streaming applications, focusing on flexibility to support different types of statistical and machine learning models. Inspired by the design of scikit-learn and Spark MLlib, we have designed a simple pipeline-based API on top of Spark Structured Streaming, that captures common patterns of the anomaly detection domain. 


At the base of the framework, we took advantage of Spark Structured Streaming fast and scalable execution engine together with stream-oriented building blocks to allow easy extension to new production grade models. We found real-time anomaly detection to provide powerful capabilities in many different fields, internally we use the framework for a variety of use cases ranging from fraud prevention, operations and even security.

Speaker: Guy Gerson

Big Data Developer @PayPal

Guy Gerson is a Software Engineer on PayPal’s next generation stream processing platform core team. He is currently working on the adaptation of Statistical and Machine learning methodologies as part of real-time data pipelines. Prior to PayPal, He was a Researcher on the IBM Cloud and Data Technologies group focusing on designing large scale Internet of Things analytics architectures.

Find Guy Gerson at