Track: Handling Sequential Data Like an Expert / ML Applied to Operations

Location: Cyril Magnin II

Day of week: Wednesday

Discussing the complexities of time, including hyper loglog, count min sketch, and more / Machine Learning in the data center. Exploring topics like Dynamic rebalancing in Dataflow, Predictive auto-scaling, and fault prediction.

Track Host: Brad Klingenberg

VP Data Science @StitchFix

Brad Klingenberg leads a team of 20+ data scientists working on human-in-the-loop machine learning at Stitch Fix. His team develop the recommendation algorithms that guide our stylists, the human experts who curate the items selected for clients. We also match our clients and stylists together and measure, monitor and optimize the role of human selection in our recommendation system.

SHORT TALK (10 MIN)

9:00am - 9:10am

Introduction to Forecasting

Franziska Bell, Senior Data Science Manager @Uber
CASE STUDY TALK (50 MIN)

9:20am - 10:10am

Understanding Software System Behavior With ML and Time Series Data

Powered by the rise of cloud technology and ubiquitous mobile connectivity, software systems have utterly transformed daily life and the global economy. However, the reliable operation of these systems has been made increasingly difficult by their sheer scale, complexity, and rapid pace of evolution.

In this talk we discuss how time series datasets collected from running software can be combined with machine learning techniques in order to aid in the understanding of system behaviors in order to improve performance and uptime.

David Andrzejewski, Engineering Manager @SumoLogic
SHORT TALK (10 MIN)

10:35am - 10:45am

Deep Learning for Language Understanding (at Google Scale)

Anjuli Kannan, Software Engineer @GoogleBrain
CASE STUDY TALK (50 MIN)

10:55am - 11:45am

Counting is Hard: Probabilistic Algorithms for View Counting at Reddit

While counting votes has always been a core feature of Reddit's platform, only recently did we begin counting and displaying view numbers. In this talk, we explain the challenges of building a view counting system at scale, and how we used probabilistic counting algorithms to make scaling easier.

Krishnan Chandra, Data Engineer @Reddit
SHORT TALK (10 MIN)

12:45pm - 12:55pm

Serverless for Data Science

Mike Lee Williams, Research engineer @Cloudera Fast Forward Labs
CASE STUDY TALK (50 MIN)

1:05pm - 1:55pm

A Cost-Sensitive Approach for Resource Allocation in Virtual Machines

Throughout recent years, ING has made a shift from hosting processes on designated physical servers to virtual machines (VM) warehouses. While this transition has contributed to ING’s development teams in providing teams agility and elasticity in resource allocations, the potential for cost reduction on infrastructure spending has not fully been realized. Many VM’s have not been shifting their resource allocation actively according to their usage, resulting in a yearly expense of over 60M EUR on (often idle) computing infrastructure.

In this application talk, Dor will take the audience step by step in the process of building an inner-organizational data science solution. Dor will share insights on the time-series model for predicting usage, the optimization which minimizes costs and risks, the process of deploying data science models to production and some best practices of creating a data science model in an agile methodology.

Dor Kedem, Senior Data Scientist @ING Nederland
SHORT TALK (10 MIN)

2:20pm - 2:30pm

A/B testing for Logistics: It all Depends

Jingjie Xiao, Data Scientist @Instacart
SHORT TALK (10 MIN)

2:20pm - 2:30pm

A/B Testing for Logistics: It All Depends

Jingjie Xiao, Data Scientist @Instacart
CASE STUDY TALK (50 MIN)

2:40pm - 3:30pm

Demand Modeling @StitchFix

Stitch Fix's mission is to transform the way people find what they love. As a human-in-the-loop retailer, Stitch Fix requires visibility into demand from our clients to avoid stockouts, minimize excess supply, and manage fulfillment centers and staffing.

This talk will describe the company’s demand model, which takes a client-behavior-based approach to forecasting. Rather than relying on non-explanatory auto-regressive components, Stitch Fix’s demand model is interpretable and more accessible to business partners. By providing visibility into how different client groups are behaving, the demand model helps Stitch Fix not only operate well but also guide strategic decision making.

Stephanie Yee, Data Scientist @StitchFix

Tracks

  • Deep Learning Applications & Practices

    Deep learning lessons using tooling such as Tensorflow & PyTorch, across domains like large-scale cloud-native apps and fintech, and tacking concerns around interpretability of ML models.

  • Predictive Data Pipelines & Architectures

    Best practices for building real-world data pipelines doing interesting things like predictions, recommender systems, fraud prevention, ranking systems, and more.

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Real-world Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.