You are viewing content from a past/completed QCon

Track: Handling Sequential Data Like an Expert / ML Applied to Operations

Location: Cyril Magnin II

Day of week: Wednesday

Discussing the complexities of time, including hyper loglog, count min sketch, and more / Machine Learning in the data center. Exploring topics like Dynamic rebalancing in Dataflow, Predictive auto-scaling, and fault prediction.

Track Host: Brad Klingenberg

VP Data Science @StitchFix

Brad Klingenberg leads a team of 20+ data scientists working on human-in-the-loop machine learning at Stitch Fix. His team develop the recommendation algorithms that guide our stylists, the human experts who curate the items selected for clients. We also match our clients and stylists together and measure, monitor and optimize the role of human selection in our recommendation system.


9:00am - 9:10am

Introduction to Forecasting

Franziska Bell, Senior Data Science Manager @Uber

9:20am - 10:10am

Understanding Software System Behavior With ML and Time Series Data

Powered by the rise of cloud technology and ubiquitous mobile connectivity, software systems have utterly transformed daily life and the global economy. However, the reliable operation of these systems has been made increasingly difficult by their sheer scale, complexity, and rapid pace of evolution.

In this talk we discuss how time series datasets collected from running software can be combined with machine learning techniques in order to aid in the understanding of system behaviors in order to improve performance and uptime.

David Andrzejewski, Engineering Manager @SumoLogic

10:35am - 10:45am

Deep Learning for Language Understanding (at Google Scale)

Anjuli Kannan, Software Engineer @GoogleBrain

10:55am - 11:45am

Counting is Hard: Probabilistic Algorithms for View Counting at Reddit

While counting votes has always been a core feature of Reddit's platform, only recently did we begin counting and displaying view numbers. In this talk, we explain the challenges of building a view counting system at scale, and how we used probabilistic counting algorithms to make scaling easier.

Krishnan Chandra, Data Engineer @Reddit

12:45pm - 12:55pm

Serverless for Data Science

Mike Lee Williams, Research engineer @Cloudera Fast Forward Labs

1:05pm - 1:55pm

A Cost-Sensitive Approach for Resource Allocation in Virtual Machines

Throughout recent years, ING has made a shift from hosting processes on designated physical servers to virtual machines (VM) warehouses. While this transition has contributed to ING’s development teams in providing teams agility and elasticity in resource allocations, the potential for cost reduction on infrastructure spending has not fully been realized. Many VM’s have not been shifting their resource allocation actively according to their usage, resulting in a yearly expense of over 60M EUR on (often idle) computing infrastructure.

In this application talk, Dor will take the audience step by step in the process of building an inner-organizational data science solution. Dor will share insights on the time-series model for predicting usage, the optimization which minimizes costs and risks, the process of deploying data science models to production and some best practices of creating a data science model in an agile methodology.

Dor Kedem, Senior Data Scientist @ING Nederland

2:20pm - 2:30pm

A/B testing for Logistics: It all Depends

Jingjie Xiao, Data Scientist @Instacart

2:20pm - 2:30pm

A/B Testing for Logistics: It All Depends

Jingjie Xiao, Data Scientist @Instacart

2:40pm - 3:30pm

Demand Modeling @StitchFix

Stitch Fix's mission is to transform the way people find what they love. As a human-in-the-loop retailer, Stitch Fix requires visibility into demand from our clients to avoid stockouts, minimize excess supply, and manage fulfillment centers and staffing.

This talk will describe the company’s demand model, which takes a client-behavior-based approach to forecasting. Rather than relying on non-explanatory auto-regressive components, Stitch Fix’s demand model is interpretable and more accessible to business partners. By providing visibility into how different client groups are behaving, the demand model helps Stitch Fix not only operate well but also guide strategic decision making.

Stephanie Yee, Data Scientist @StitchFix


  • Groking Timeseries & Sequential Data

    Techniques, practices, and approaches, including image recognition, NLP, predictions, & modeling.

  • Deep Learning in Practice

    Deep learning lessons using Tensorflow, Keras, PyTorch, Caffe including use cases on machine translation, computer vision, & image recogition.

  • AI Meets the Physical World

    Where AI touches the physical world, think drones, ROS, NVidia, TPU and more.

  • Papers to Production: CS in the Real World

    Groundbreaking papers make real world impact.

  • Solving Software Engineering Problems with Machine Learning

    Anomaly detection, ML in IDE's, bayesian optimization for config. Machine Learning techniques for more effective software engineering.

  • Predictive Architectures in the Real World

    Case Study focused look at end to end predictive pipelines from places like Salesforce, Uber, Linkedin, & Netflix.