Categories
Misc

Build Speech AI in Multiple Languages and Train Large Language Models with the Latest from Riva and NeMo Megatron

Graphical representation of automatic speech recognition for transcription, controllable text-to-speech, and natural language processing in a chatbot.Read a recap of conversational AI announcements from NVIDIA GTC.Graphical representation of automatic speech recognition for transcription, controllable text-to-speech, and natural language processing in a chatbot.

Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key updates to NeMo Megatron, a framework for training Large Language Models, were also announced. 

Riva 2.0 general availability

Riva offers world-class accuracy for real-time automatic speech recognition (ASR) and text-to-speech (TTS) skills across multiple languages and can be deployed on-prem, in any cloud. Industry leaders such as Snap, T-Mobile, RingCentral, and Kore.ai use Riva in customer care center applications, transcription, and virtual assistants.

The latest Riva version includes:

  • ASR in multiple languages: English, Spanish, German, Russian, and Mandarin.
  • High-quality TTS voices customizable for unique voice fonts.
  • Domain-specific customization with TAO Toolkit or NVIDIA NeMo for unparalleled accuracy in accent, domain, and country-specific jargon.
  • Support to run in cloud, on-prem, and on embedded platforms.
A GIF showing how to control Riva text-to-speech pitch and speed using SSML tags.
Figure 1: NVIDIA Riva controllable text-to-speech makes it easy to adjust pitch and speed using SSML tags.

Try Riva automatic speech recognition on the Riva product page.

Defined.ai has collaborated with NVIDIA to provide a smooth workflow for enterprises looking to purchase speech training and validation data across languages, domains, and recording types. A sample of the DefinedCrowd dataset for NVIDIA developers can be found here.

Download Riva, which is available free for members of the NVIDIA Developer program from NGC.

Riva Enterprise

NVIDIA also introduced Riva Enterprise, a paid offering for enterprises deploying Riva at scale with business-standard support from NVIDIA experts. 

Benefits include:

  • Unlimited use of ASR and TTS services on any cloud and on-prem platforms.
  • Access to NVIDIA AI experts during local business hours for guidance on configurations and performance.
  • Long-term support for maintenance control and upgrade schedule.
  • Priority access to new releases and features.

Riva Enterprise is available as a free trial on NVIDIA Launchpad for enterprises to evaluate and prototype their applications.

Riva Enterprise on launchpad includes guided labs to:

  • Interact with Real-Time Speech AI APIs.
  • Add Speech AI Capabilities to a Conversational AI Application. 
  • Fine-Tune a Speech AI Pipeline on Custom Data for Higher Accuracy.

Apply for your Riva Enterprise trial.

Learn more about how to build, optimize, and deploy speech AI applications from the Conversational AI Demystified GTC session.


NeMo Megatron

NVIDIA announced new updates to NVIDIA NeMo Megatron, a framework for training large language models (LLM) up to trillions of parameters. Built on innovations from the Megatron paper, with NeMo Megatron research institutions and enterprises can train any LLM to convergence. NeMo Megatron provides data preprocessing, parallelism (data, tensor, and pipeline), orchestration and scheduling, and auto-precision adaptation.

It consists of thoroughly tested recipes, popular LLM architecture implementations, and necessary tools for organizations to quickly start their LLM journey.

AI Sweden, JD.com, Naver, and the University of Florida are early adopters of NVIDIA technologies for building large language models.

The latest version includes:

  • Hyperparameter tuning tool—automatically creates recipes based on customers’ needs and infrastructure limitations. 
  • Reference recipes for T5 and mT5 models.
  • Support to train LLM on cloud, starting with Azure.
  • Distributed data preprocessing scripts to shorten end-to-end training time.

Apply for NeMo Megatron early access.

Learn more about interesting applications of LLMs and best practices to deploy them in the Natural Language Understanding in Practice: Lessons Learned from Successful Enterprise Deployments GTC session.

Leave a Reply

Your email address will not be published.