You are viewing content from a past/completed QCon

Presentation: End to End ML Without a Data Scientist

Track: Hands-on Codelabs & Speakers Office Hours

Location: Cyril Magnin I

Duration: 4:20pm - 5:10pm

Day of week: Tuesday

Share this on:


Machine Learning is super cool, but what about those of us who maybe got a D in statistics (or maybe didn't bother taking the class). With modern systems, it's relatively simple to train a model regardless of your background, but how do you know if the model you've trained does the "right" thing and how do you actually use your model? This talk will explore how to train models (using big data because that's what the presenter works with, but it will work just fine on small data as well), and how to serve them. We'll then talk about basic validation techniques, why you should A/B test, and the importance of keeping your models up to date (the world & humans keep _changing_ right after we've fit our models, its very frustrating).

Despite how fun deep learning is, this talk will focus on more easily explainable & spot checked models, like linear regression and decision trees.

Speaker: Holden Karau

Spark Committer & Open Source Developer Advocate

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects.  Prior to joining Google as a Developer Advocate she worked at IBM, Alpine, Databricks, Google (yes this is her second time), Foursquare, and Amazon. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys playing with fire, riding scooters, and dancing.

Find Holden Karau at

2019 Tracks

  • Groking Timeseries & Sequential Data

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.