You are viewing content from a past/completed QCon

Presentation: Michelangelo Palette: A Feature Engineering Platform at Uber

Track: Predictive Architectures in the Real World

Location: Cyril Magnin I

Duration: 2:20pm - 3:00pm

Day of week: Tuesday

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

Abstract

Feature Engineering can be loosely described as the process of extracting useful signals from the underlying raw data for use in predictive decisioning systems such as Machine Learning (ML) models, or Business rules engines. The raw data is often available via heterogenous types of underlying systems such as offline/batch computed data in Hadoop or other data warehouses, key-value datastores, production microservices, streaming data jobs or services. Traditionally, such engineering has been achieved via the use of adhoc data pipelines, or feature serving layers/services. In our experience at Uber, such practices have turned out to be quite fragile resulting in hard to maintain infrastructure, and a large amount of redundant engineering. Moreover With ML models, it has exposed serious problems such as training/serving skew.

In this talk, we'll be presenting the infrastructure we're building within Uber's Michelangelo ML Platform that:

  1. Enables a general approach to Feature Engineering across diverse data systems such as offline/batch data warehouses (eg Apache Hive), realtime data in Uber's key-value stores (such as Cassandra) or production microservices, or in near realtime via the use of stream processing infrastructure based on Apache Kafka for eg.
  2. Demonstrates how the ML training/serving skew problem is addressed by ensuring data parity across online/serving and offline/training systems.
  3. Discusses the scalability challenges, and the sensitivities around serving data in single digit milliseconds.

Speaker: Amit Nene

Staff Engineer, Tech Lead Manager @Uber

Amit Nene has led engineering teams on the Michelangelo ML Platform and Risk Platform at Uber, driving several projects such as the Palette feature store, feature pipelines and data engineering, and feature transformers. Prior to Uber, he has led several datacenter infrastructure initiatives at companies such as VMware.

Find Amit Nene at

Speaker: Eric Chen

Tech Lead & Manager @Uber

Eric Chen leads the offline model processing pipelines and model online and offline serving accuracy of Michelangelo ML Platform at Uber, driving several projects such as customizble workflows, customizable transformers and model training across multiple computing environments. Prior to Uber, he worked on search quality and maps in Google.

Find Eric Chen at

2019 Tracks