Presentation: Scaling Deep Learning to Petaflops and Beyond!
Share this on:
This presentation is now available to view on InfoQ.com
Watch video with transcriptAbstract
NERSC has successfully applied Deep Learning to a range of scientific workloads. Motivated by the volume and complexity of scientific datasets, and the computationally demanding nature of DL, we have undertaken several projects targeted at scaling DL on the largest CPU and GPU-based systems in the world. This talk will explore 2D and 3D convolutional architectures for solving pattern classification, regression and segmentation problems in high-energy physics, cosmology and climate science. Our efforts have resulted in a number of first-time results: scaling Caffe to 9600 Cori/KNL nodes obtaining 15PF performance (SC’17), scaling TensorFlow to 8192 Cori/KNL nodes obtaining 3.5PF performance (SC’18), and finally, scaling TensorFlow to 4560 Summit/Volta nodes, obtaining 1EF performance (SC’18). The talk will review lessons learnt from these projects, and outline future challenges for the DL community.