Presentation: End to End ML Without a Data Scientist

Track: Hands-on Codelabs & Speakers Office Hours

Location: Cyril Magnin I

Duration: 4:20pm - 5:10pm

Day of week: Tuesday

Share this on:


Machine Learning is super cool, but what about those of us who maybe got a D in statistics (or maybe didn't bother taking the class). With modern systems, it's relatively simple to train a model regardless of your background, but how do you know if the model you've trained does the "right" thing and how do you actually use your model? This talk will explore how to train models (using big data because that's what the presenter works with, but it will work just fine on small data as well), and how to serve them. We'll then talk about basic validation techniques, why you should A/B test, and the importance of keeping your models up to date (the world & humans keep _changing_ right after we've fit our models, its very frustrating).

Despite how fun deep learning is, this talk will focus on more easily explainable & spot checked models, like linear regression and decision trees.

Speaker: Holden Karau

Spark Committer & Open Source Developer Advocate

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects.  Prior to joining Google as a Developer Advocate she worked at IBM, Alpine, Databricks, Google (yes this is her second time), Foursquare, and Amazon. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys playing with fire, riding scooters, and dancing.

Find Holden Karau at


  • Deep Learning Applications & Practices

    Deep learning lessons using tooling such as Tensorflow & PyTorch, across domains like large-scale cloud-native apps and fintech, and tacking concerns around interpretability of ML models.

  • Predictive Data Pipelines & Architectures

    Best practices for building real-world data pipelines doing interesting things like predictions, recommender systems, fraud prevention, ranking systems, and more.

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Real-world Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.