Automated speech recognition services with improved accuracy for enterprise applications are the object of a new batch of pre-trained deep learning models and software from Nvidia aimed at interactive AI conversational services.
The Jarvis translation platform announced during this week’s Nvidia GPU Technology Conference casts a wide net across different industry and domain applications. The Jarvis models are designed to generate more accurate speech recognition along with real-time translations to five languages—with more to come—along with text-to-speech capabilities for conversational AI agents.
Nvidia (NASDAQ: NVDA) promotes Jarvis as a GPU-accelerated deep learning AI platform for speech recognition and generation, language understanding and translations. “Jarvis interacts in about 100 milliseconds,” Nvidia CEO Jensen Huang noted in his GTC21 keynote address.
The machine translator was trained with over several million GPU-hours on more than 1 billion pages of text along with 60,000 hours of speech in different languages. Huang claimed Jarvis achieved 90-percent recognition accuracy “out of the box.”
The initial output from Jarvis can be fine-tuned with internal data using Nvidia’s new model training framework dubbed TAO, which customizes pre-trained models for “domain-specific applications” across different industries.
Jarvis currently supports English translations to and from French, German, Japanese, Russian, and Spanish, with more languages coming.
Huang noted that Jarvis can be deployed in the cloud or EGX AI edge accelerators for AI in data centers as well as edge implementations running on Nvidia new 5G application framework, EGX Aerial.
Nvidia launched an early access program for Jarvis last year. So far, the conversational AI tools have attracted more than 45,000 downloads.
Among the early adopter is T-Mobile (NASDAQ: TMUS), which is using Jarvis for real-time customer service applications.
Huang also announced a partnership with Mozilla Common Voice, a crowdsourcing project that hosts the largest open multi-lingual voice data set covering 60 different languages. Nvidia DGX processors will be used to train Jarvis in developing pre-trained models using the public domain data set. Those models will be released for free to the open source community, Huang said.
“Let’s make universal translation possible, and help people around world understand each other,” the Nvidia CEO added.
Nvidia also said new Jarvis features will be released during the second quarter as part of its ongoing beta program. The Jarvis toolkit can be downloaded now from the Nvidia NGC catalog, a hub for GPU-based deep learning, machine learning and HPC applications released in March.
Nvidia posted a Jarvis explainer video here.
Investors responded favorably to a slew of GPU-related announcements during the first day of the GTC event: Nvidia shares jumped more than 5 percentage points at the close of trading on Monday (April 12).