You are viewing content from a past/completed QCon

Presentation: Simplifying ML Workflows With Apache Beam

Track: Real-world Data Engineering

Location: Cyril Magnin III

Duration: 10:55am - 11:45am

Day of week: Wednesday

Share this on:


Come learn how Apache Beam is simplifying pre- and post-processing for ML pipelines. Apache Beam provides a portability layer that allows Beam pipelines to be written once and executed on any supported runtime. 2018 will be the year in which the Beam community completes the portability vision laid out in when the project was founded, with full cross-language portability and robust open source runner support for Apache Flink and Spark.

Come see where we are in that journey, and learn how Beam is being integrated into the world of AI.

Speaker: Tyler Akidau

Founder/Committer on Apache Beam & Engineer @Google

Tyler Akidau is a senior staff software engineer at Google, where he is the technical lead for the Data Processing Languages & Systems group, responsible for Google's Apache Beam efforts, Google Cloud Dataflow, and internal data processing tools like Google Flume, MapReduce, and MillWheel. His also a founding member of the Apache Beam PMC. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is a firm believer in batch and streaming as two sides of the same coin, with the real endgame for data processing systems the seamless merging between the two. He is the author of the 2015 Dataflow Model paper, the Streaming 101 and Streaming 102 articles, and the upcoming Streaming Systems book. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.

Find Tyler Akidau at

2019 Tracks

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.

  • Handling Sequential Data Like an Expert / ML Applied to Operations

    Discussing the complexities of time (half track) and Machine Learning in the data center (half track). Exploring topics from hyper loglog to predictive auto-scaling in each of two half-day tracks.

    Half-day tracks