You are viewing content from a past/completed QCon

Presentation: wav2letter++: Facebook's Fast Open-Source Speech Recognition System

Track: Papers in Production: Modern CS in the Real World

Location: Cyril Magnin III

Duration: 11:40am - 12:20pm

Day of week: Tuesday

Slides: Download Slides

Share this on:

This presentation is now available to view on

Watch video with transcript


In this talk I will introduce wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. I will explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. I will also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest that has been tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks. 

Speaker: Vitaliy Liptchinsky

Research Engineering Manager @Facebook AI Research

Vitaliy Liptchinsky earned his PhD at Vienna Technical University (TU Wien), Distributed Systems Group. In his professional career, Vitaliy worked on solving vast variety of engineering problems, ranging from prehistoric mobile applications and enterprise systems to highly optimized storage engines and large-scale deep learning systems.
At the present time, Vitaliy focuses on scaling research efforts at Facebook AI Research (FAIR) in Menlo Park and Seattle locations. Prior to joining FAIR Vitaliy worked at Microsoft Research (MSR) on natural language processing, and long before that on probabilistic and statistical models of various sport events.

Find Vitaliy Liptchinsky at

2019 Tracks