Predictive Data Pipelines & Architectures

Day of week: Tuesday

Predictive data pipelines have become essential to building engaging experiences on the web today. Whether you enjoy personalized news feeds on LinkedIn and Facebook, profit from near realtime updates to search engines and recommender systems, or benefit from near-realtime fraud detection on a lost or stolen credit card, you have come to rely on the fruits of predictive data pipelines as an end user. As a ops-focused engineer, you may employ these pipelines to understand complex call trees in your microservice-based infrastructure with the aim to eliminate redundant system load or improve mobile and web application performance. Come to this track to learn about interesting applications of predictive systems and the fundamentals that underlie them.

Track Host:
Sid Anand
Chief Data Engineer @PayPal

Sid Anand currently serves as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid spends time with his wife, Shalini, and their 2 kids.

by Danny Yuan
Real-time Streaming Lead @Uber

by Chong Sun
Senior Software Engineer @Uber

Uber's Marketplace is the algorithmic brain behind Uber's ride-sharing services. To help Marketplace systems make proactive and efficient decisions, the Marketplace Forecasting team builds and operates multiple machine learning models to produce forecast of many metrics, including supply and demand, over both granular time and a large number of geo-spatial dimensions.

To empower both data scientists and engineers to build and manage models that range from regressions to neural...

by Gurinder Grewal
Risk Chief Architect @Paypal

by Mikhail Kourjanski
Lead Data Architect @Paypal

PayPal processes about a billion dollars of payment volume daily ($354bn in FY2016); complex decisions are made for each transaction or user action, to manage risk and compliance, while also ensuring good user experience. PayPal users can make payments immediately in 200 countries with the assurance that the company’s transactions are secure. 

How does PayPal achieve this goal in today's complex environment filled with "high-level" fraudsters as well as constantly increasing...

by Balaji Rengarajan
Senior Data Scientist @Celect

We describe an extensible cloud independent data science platform based on Celect’s pDB framework for non-parametric machine learning. The pDB framework provides a common abstraction for almost of all machine learning problems of interest, including classification, personalization, time series predictions, linear and non linear regression. We developed an extensible and flexible data platform around the core pDB framework. This platform was borne out of the need for us to provide scalable...

by Leah McGuire
Principal Member of Technical Staff @Salesforce

by Mayukh Bhaowal
Director of Product Management @Salesforce

Machine Learning (ML) software differs from traditional software in the sense that outcomes are not based on a set of hand-coded rules and hence not easily predictable. The behavior of such software changes over time based on data and feedback loops. At Salesforce Einstein, we care deeply about building trust and confidence in such intelligent software programs. Why does a particular email have a higher likelihood of being opened than another? What are the shapes and patterns in the dataset...


  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Real-world Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.