Presentation: Petastorm: A Light-Weight Approach to Building ML Pipelines @Uber
This presentation is now available to view on InfoQ.com
Watch video with transcriptAbstract
Data produced and managed by Big Data systems like Apache Spark and Hive cannot be directly consumed by Deep Learning systems like Tensorflow and PyTorch. Petastorm bridges this gap by enabling direct consumption of data in Apache Parqet format into Tensorflow and PyTorch. In this talk, we describe how Petastorm facilitates tighter integration between Big Data and Deep Learning worlds; simplifies data management and data pipelines; and speeds up model experimentation.