Reducing Development Time for Intelligent Virtual Assistants in Contact Centers

As the global service economy grows, companies rely increasingly on contact centers to drive better customer experiences, increase customer satisfaction, and…

As the global service economy grows, companies rely increasingly on contact centers to drive better customer experiences, increase customer satisfaction, and lower costs with increased efficiencies. Customer demand has increased far more rapidly than contact center employment ever could. Combined with the high agent churn rate, customer demand creates a need for more automated real-time customer communication augmenting the agents.

Researchers recognized these trends as early as the 1970s and began developing primitive voice menus navigable through touch-tone phones. While voice menus may answer frequently asked questions and reduce pressure on contact center agents, customers often find it frustrating to interact with them.

Chances are that you may have been one of the callers who wanted to speak to an agent directly, instead listening to multiple layers of prerecorded voice prompts, due to any of the following reasons:

Listening to menu options that best match your queries takes time. Moreover, after you reach a contact center agent, your issue may be complex enough that it cannot be resolved in one call.
Your issue may not closely match the menu options, or it might fall under multiple options.
You and the contact center agent may not speak the same native languages, particularly if the contact center is outsourced to another country.
Some contact centers may not be staffed at a convenient time for you to call.

To effectively resolve these issues, companies have begun integrating intelligent virtual assistants (IVAs), also known as AI virtual assistants, into their contact center solutions.

In this post, we provide an overview of building and deploying contact center IVAs with the NVIDIA contact center IVA workflow and components such as NVIDIA Riva voice technology and speech AI skills:

Automatic speech recognition (ASR) or speech-to-text (STT)
Text-to-speech (TTS)

Reducing development time for IVA applications

IVAs are AI-powered software that recognize human speech, understand the intent, and provide precise and personalized responses in human-like voices while engaging with customers in conversation.

Around the clock, IVAs collect customer information and reasons for the call and manage customer issues without the need for a live agent. For complex cases, this information is automatically prepared for the live agent, to optimize servicing customers with a personal touch.

You can use NVIDIA Riva speech AI building blocks to create IVA applications. To reduce development time, you can leverage NVIDIA contact center IVA workflow with integrated Riva skills.

This NVIDIA AI solution workflow provides a reference for you to get started without preparation, helping you achieve the desired AI outcome more quickly.

NVIDIA contact center IVA workflow and components

The NVIDIA contact center IVA workflow (Figure 1) was designed as a microservice, which means it can be deployed on Kubernetes alone or with other microservices to create a production-ready application for seamless scaling.

Diagram showing full architecture design to build and deploy an intelligent virtual assistant using NVIDIA Riva, Rasa Dialog Manager, and Haystack. — *Figure 1. NVIDIA Contact Center IVA architecture with NVIDIA Riva ASR and TTS, Rasa Dialog Manager, and Haystack NLP IRQA components*

How services and dialog managers are integrated for deployment

This workflow integrates NVIDIA Riva ASR and TTS services with Haystack, a third-party open-source natural language information retrieval question answering (NLP IRQA) service, and Rasa, an open-source dialog manager.

Figure 1 shows that the Riva ASR service transcribes a user’s spoken question. Rasa and Haystack are used to interpret the user’s intent in the question and construct a relevant response. This response is delivered to the user in synthesized natural speech using Riva TTS.

For context, NVIDIA Riva provides tools for building and deploying conversational AI and speech AI pipelines to any device containing an NVIDIA GPU, whether on the edge, in a data center, or in the cloud. The tools also run inference with those pipelines.

Language-specific customizations for the financial industry

The NVIDIA contact center IVA workflow features Riva ASR customizations for the financial services industry use case.

These Riva ASR customizations are performed in two sample Jupyter notebooks:

To improve the recognition of finance-specific terms.
To enhance recognition of finance terms in challenging acoustic environments, including noise, accents, and dialects.
To provide explicit guides for pronunciation of finance-specific words.

For more information about customizing Riva ASR models, see ASR Customization Best Practices.

Dialog manager training and IRQA components

After Riva ASR customization, you can work on the IVA dialog manager on information retrieval and question-answering (IRQA) components. Every IVA requires a way to manage the state and flow of the conversation.

A dialog manager employs a language model like BERT to recognize the user intent in the transcribed text obtained from the Riva ASR service. It then routes the question to the correct prepared response or a fulfillment service. This provides context for the question and frames how the IVA can give the proper response.

The Rasa dialog manager also maintains the dialog state, by filling slots set by the developer for remembering the context of the conversation. It can be trained to understand user intent by giving it a few examples of each intent and the slots to be recognized.

IRQA with Haystack NLP is then used to search a list of given documents and generate a long-form response to the user’s question. This assists companies with massive amounts of unstructured data that need to be consumed in a form that is helpful to the customer. After IRQA generates the answer, Riva TTS synthesizes a human-like audio response.

To summarize, the NVIDIA contact center IVA workflow can be deployed on any cloud Kubernetes distribution as a collection of Helm charts, each running a microservice.

While the NVIDIA contact center IVA architecture uses Haystack and Rasa components, you can use your preferred components.

All the NVIDIA contact center IVA workflow-packaged components include enterprise-ready implementation best practices that range from authentication, monitoring, reporting, and load balancing while enabling customization.

Optimal inference based on usage metrics

The NVIDIA contact center IVA workflow includes NVIDIA Triton Inference Server, which provides Prometheus with metrics indicating GPU and request statistics. The metric format is plain text so you can view them directly in the Grafana dashboard.

Some of the metrics available are shown in Table 1.

Category	Metric	Description
Count	Success Count	`nv_inference_request_success`
Failure Count	`nv_inference_request_failure`	Number of failed inference requests received by NVIDIA Triton (each request is counted as 1, even if the request contains a batch)
Inference Count	`nv_inference_count`	Number of inferences performed (a batch of n is counted as n inferences and does not include cached requests)
Execution Count	`nv_inference_exec_count`	Number of inference batch executions (see Count Metrics, does not include cached requests)
Latency	Request Time	`nv_inference_request_duration_us`
Queue Time	`nv_inference_queue_duration_us`	Cumulative time requests spend waiting in the scheduling queue (includes cached requests)
Compute Input Time	`nv_inference_compute_input_duration_us`	Cumulative time requests spend processing inference inputs (in the framework backend, does not include cached requests)
Compute Time	`nv_inference_compute_infer_duration_us`	Cumulative time requests spend executing the inference model (in the framework backend, does not include cached requests)
Compute Output Time	`nv_inference_compute_output_duration_us`	Cumulative time requests spend processing inference outputs (in the framework backend, does not include cached requests)

Table 1. NVIDIA Triton Server metrics used for Riva pods manual or automatic scaling

Depending on these usage metrics, the Riva pods can be scaled manually or automatically.

Conclusion

NVIDIA Riva provides speech AI tools that enable companies to build and deploy IVAs in contact centers. These assistants relieve the pressure on human agents while granting customers the interactivity and personal treatment that they expect from live employees. This all drives a better customer experience.

IVAs can also significantly increase contact center efficiency by reducing customer wait times, providing real-time translation, resolving customer challenges faster, reducing agent onboarding time, and enabling customers to reach contact centers 24/7. Companies can also use contact center call transcripts to further hone their products and services.

Related resources

The NVIDIA contact center IVA workflow will be available on NGC for NVIDIA AI Enterprise software customers at the end of December.

In the meantime, you can sign up for NVIDIA LaunchPad to gain hands-on experience and immediately tap into the necessary hardware and software stacks to test and prototype your conversation-based solutions. The workflow solutions will be available on LaunchPad beginning January 20, 2023.

For step-by-step instructions on enhancing contact centers with Riva’s speech AI services, see the webinar, How to Build and Deploy an AI Voice-Enabled Virtual Assistant for Financial Services Contact Centers.

To learn how real companies have benefited from Riva speech AI skills in their contact centers, see the T-Mobile and Floatbot use case stories.