Track: Predictive Architectures in the Real World

Location: Cyril Magnin I + II

Day of week: Tuesday

Predictive data pipelines have become essential to building engaging experiences on the web today. Whether you enjoy personalized news feeds on LinkedIn and Facebook, profit from near real-time updates to search engines and recommender systems, or benefit from near-realtime fraud detection on a lost or stolen credit card, you have come to rely on the fruits of predictive data pipelines as an end user.

Running a successful machine learning project in production takes more than a clever algorithm. In this track, the experts who built some of the most successful commercial recommendation systems, will tell us what it really takes. How do you build the architectures, data pipelines and devops best practices that help drive real-world machine learning?

Track Host: Gwen Shapira

Principal Data Architect @Confluent, PMC Member @Kafka, & Committer Apache Sqoop

Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

10:40am - 11:20am

Instrumentation, Observability & Monitoring of Machine Learning Models

Josh Wills, Software Engineer, Search, Learning, and Intelligence @SlackHQ

1:20pm - 2:00pm

Massive Scale Anomaly Detection Framework

Guy Gerson, Big Data Developer @PayPal
Uri Silberstein, Senior Cloud & Big Data Developer @PayPal

2:20pm - 3:00pm

People You May Know: Fast Recommendations Over Massive Data

Sumit Rangwala, Staff Software Engineer - Artificial Intelligence @LinkedIn
Felix GV, Staff Software Engineer @LinkedIn

3:20pm - 4:00pm

Michelangelo Palette: A Feature Engineering Platform at Uber

Feature Engineering can be loosely described as the process of extracting useful signals from the underlying raw data for use in predictive decisioning systems such as Machine Learning (ML) models, or Business rules engines. The raw data is often available via heterogenous types of underlying systems such as offline/batch computed data in Hadoop or other data warehouses, key-value datastores, production microservices, streaming data jobs or services. Traditionally, such engineering has been achieved via the use of adhoc data pipelines, or feature serving layers/services. In our experience at Uber, such practices have turned out to be quite fragile resulting in hard to maintain infrastructure, and a large amount of redundant engineering. Moreover With ML models, it has exposed serious problems such as training/serving skew.
In this talk, we'll be presenting the infrastructure we're building within Uber's Michelangelo ML Platform that:

  1. Enables a general approach to Feature Engineering across diverse data systems such as offline/batch data warehouses (eg Apache Hive), realtime data in Uber's key-value stores (such as Cassandra) or production microservices, or in near realtime via the use of stream processing infrastructure based on Apache Kafka for eg.
  2. Demonstrates how the ML training/serving skew problem is addressed by ensuring data parity across online/serving and offline/training systems.
  3. Discusses the scalability challenges, and the sensitivities around serving data in single digit milliseconds.

Amit Nene, Staff Engineer, Tech Lead Manager @Uber
Eric Chen, Tech Lead & Manager @Uber

2019 Tracks

  • Groking Timeseries & Sequential Data

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.