Presentation: wav2letter++: Facebook's Fast Open-Source Speech Recognition System
Share this on:
This presentation is now available to view on InfoQ.com
Watch video with transcriptAbstract
In this talk I will introduce wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. I will explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. I will also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest that has been tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.