You are viewing content from a past/completed QCon

Track: Predictive Data Pipelines & Architectures

Location: Cyril Magnin I

Day of week: Tuesday

Predictive data pipelines have become essential to building engaging experiences on the web today. Whether you enjoy personalized news feeds on LinkedIn and Facebook, profit from near realtime updates to search engines and recommender systems, or benefit from near-realtime fraud detection on a lost or stolen credit card, you have come to rely on the fruits of predictive data pipelines as an end user. As a ops-focused engineer, you may employ these pipelines to understand complex call trees in your microservice-based infrastructure with the aim to eliminate redundant system load or improve mobile and web application performance. Come to this track to learn about interesting applications of predictive systems and the fundamentals that underlie them.

Track Host: Sid Anand

Chief Data Engineer @PayPal

Sid Anand currently serves as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid spends time with his wife, Shalini, and their 2 kids.

10:40am - 10:50am

Transmogrification: The Magic of Feature Engineering

Leah McGuire, Principal Member of Technical Staff @Salesforce
Mayukh Bhaowal, Director of Product Management @Salesforce

11:00am - 11:50am

The Black Swan of Perfectly Interpretable Models

Leah McGuire, Principal Member of Technical Staff @Salesforce
Mayukh Bhaowal, Director of Product Management @Salesforce

2:25pm - 2:35pm

Building (Better) Data Pipelines with Apache Airflow

Sid Anand, Chief Data Engineer @PayPal

2:45pm - 3:35pm

Data Pipelines for Real-Time Fraud Prevention at Scale

Mikhail Kourjanski, Lead Data Architect @Paypal

4:20pm - 5:10pm

pDB: Scalable Prediction Infrastructure With Precision and Provenance

We describe an extensible cloud independent data science platform based on Celect’s pDB framework for non-parametric machine learning. The pDB framework provides a common abstraction for almost of all machine learning problems of interest, including classification, personalization, time series predictions, linear and non linear regression. We developed an extensible and flexible data platform around the core pDB framework. This platform was borne out of the need for us to provide scalable and flexible predictive analytics solutions for Retailers and Federal Government.
In this talk, I will describe the pDB formalism associated with the platform, architectural aspects for data import/ETL, data transformation, compute and query architecture, cross-validation, cluster management, pipeline definition and workflow orchestration. We will illustrate the use of the platform through multiple use cases such as online personalization, document classification, and geospatial anomaly detection.

Balaji Rengarajan, Senior Data Scientist @Celect
On the topic of

Data Pipeline Practices

12:50pm - 1:00pm

Two Effective Algorithms for Time Series Forecasting

Danny Yuan, Real-time Streaming Lead @Uber

1:10pm - 2:00pm

Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace

Danny Yuan, Real-time Streaming Lead @Uber
Chong Sun, Senior Software Engineer @Uber

2019 Tracks