Presentation: End to End ML Without a Data Scientist

Track: Hands-on Codelabs & Speakers Office Hours

Location: Cyril Magnin I

Duration: 4:20pm - 5:10pm

Day of week: Tuesday

Share this on:


Machine Learning is super cool, but what about those of us who maybe got a D in statistics (or maybe didn't bother taking the class). With modern systems, it's relatively simple to train a model regardless of your background, but how do you know if the model you've trained does the "right" thing and how do you actually use your model? This talk will explore how to train models (using big data because that's what the presenter works with, but it will work just fine on small data as well), and how to serve them. We'll then talk about basic validation techniques, why you should A/B test, and the importance of keeping your models up to date (the world & humans keep _changing_ right after we've fit our models, its very frustrating).

Despite how fun deep learning is, this talk will focus on more easily explainable & spot checked models, like linear regression and decision trees.

Speaker: Holden Karau

Spark Committer & Open Source Developer Advocate

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects.  Prior to joining Google as a Developer Advocate she worked at IBM, Alpine, Databricks, Google (yes this is her second time), Foursquare, and Amazon. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys playing with fire, riding scooters, and dancing.

Find Holden Karau at

Proposed Tracks

  • Real-World Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.

  • Deep Learning Applications & Practices

    Deep learning lessons using Tensorflow, Keras, PyTorch, Caffe across machine translation, computer vision.

  • AI Meets the Physical World

    The track where AI touches the physical world, think drones, ROS, NVidea, TPU and more.

  • Data Architectures You've Always Wondered About

    How did they do that? Real-time predictive pipelines at places like Uber, Self-Driving Cars at Google, Robotic Warehouses from Ocado in the UK, are all possible examples.

  • Applied ML for Software

    Practical machine learning inside the data centers and on software engineering teams.

  • Time Series Patterns & Practices

    Stocks, ad tech/real-time bidding, and anomaly detection. Patterns and practices for more effective Time Series work.