Presentation: Building (Better) Data Pipelines with Apache Airflow

Track: Predictive Data Pipelines & Architectures

Location: Cyril Magnin I

Duration: 2:25pm - 2:35pm

Day of week: Tuesday

Share this on:

Abstract

Apache Airflow is an up-and-coming platform to programmatically author, schedule, manage, and monitor workflows. Central to Airflow’s design is that is requires users to define DAGs (directed acyclic graphs) a.k.a. workflows in Python code, so that DAGs can be managed via the same software engineering principles and practices used to manage any other code.

With more than 7600 GitHub stars, 2400 forks, 430 contributors, 150 companies officially using it, and 4600 commits, it is quickly gaining traction among data science, ETL engineering, data engineering, and devops communities at large. What makes Apache Airflow so popular? Come to this talk to get a whirlwind intro based on a real-world predictive data pipeline example.

Note: This is a short talk. Short talks are 10-minute talks designed to offer breadth across the areas of machine learning, artificial intelligence, and data engineering. The short talks are focused on the tools and practices of data science with an eye towards the software engineer.

Host: Sid Anand

Chief Data Engineer @PayPal

Sid Anand currently serves as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid spends time with his wife, Shalini, and their 2 kids.

Find Sid Anand at

Proposed Tracks

  • Real-World Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.

  • Deep Learning Applications & Practices

    Deep learning lessons using Tensorflow, Keras, PyTorch, Caffe across machine translation, computer vision.

  • AI Meets the Physical World

    The track where AI touches the physical world, think drones, ROS, NVidea, TPU and more.

  • Data Architectures You've Always Wondered About

    How did they do that? Real-time predictive pipelines at places like Uber, Self-Driving Cars at Google, Robotic Warehouses from Ocado in the UK, are all possible examples.

  • Applied ML for Software

    Practical machine learning inside the data centers and on software engineering teams.

  • Time Series Patterns & Practices

    Stocks, ad tech/real-time bidding, and anomaly detection. Patterns and practices for more effective Time Series work.