Automated voice services deliver better customer experiences

July 16th, 2020 Written by Ben Childs

As many businesses have facilitated easier use of their ‘always available’ services, customers have become accustomed to managing the day-to-day needs of their digital services using mobile apps and self-service websites. By enabling happy and engaged customers to look after themselves, a company’s customer services function can focus on customers who have problems or are considering cancelling their service. In sectors such as banking and utilities where switching providers has become increasingly easy, many businesses recognise that excellent customer service can be as important to customer loyalty and satisfaction as core service features. However, delivering high levels of customer service is a complicated and resource-intensive process and therefore to improve the customer experience and reduce the cost to the business, traditional customer services such as call centres are increasingly supplemented by automated voice services that provide customers a level of self-service over the phone.

An automated voice service can benefit a business by reducing costs, improving data insights, integrating better with other systems and reducing operational overheads. From the customer perspective, whilst some people may always prefer to speak to a real person, the inadequacies of previous automated systems can now be significantly improved with a more ‘human-like’ experience, the ability to self-service at the convenience of the customer and overcoming privacy concerns by not providing personal or account details to human call centre operators.

Although an automated voice service can deliver benefits to both the business and the customer, voice is an inherently more human interaction than engaging with a website, and hence businesses face the challenge of creating a sense of care and personalisation through an ‘interface’ that people have used and mastered since childhood. Until recently, automated voice experiences have been quite robotic, with only limited and highly complex methods of customisation, but a new generation of voice technologies enable businesses to develop automated voice services that can reduce operational costs, increase resilience, improve data insights and simultaneously enhance the customer experience.

Designing improved automated voice services

Cloud-based, automated customer service solutions such as Amazon Connect primarily support customer engagement through the channels of voice and online chat. For the voice channel, the audio a user hears and engages with is generated from predefined text using a text-to-speech (TTS) engine. Modern speech synthesis engines such as Amazon Polly or Google Text-to-Speech use artificial intelligence and Big Data to learn and mimic voice patterns, creating a more natural and ultimately more human voice experience than earlier ‘robotic’ speech synthesis engines. Text is encoded using a voice-specific markup language (SSML) to affect how the speech synthesis engine pronounces words or inserts breaths during the translation of text to speech. However, creating a better automated voice experience requires a broader approach than merely understanding and implementing the markup used to customise the synthesised speech.

1. Define the locale, tone and personality of the voice service

Experiencing an automated voice that is localised and sounds familiar to the customer is a crucial part of creating a better voice experience. The Amazon and Google solutions provide a huge range of voices to synthesise speech across many different locales and languages, so understanding which voices are available and most appropriate for a voice service is a crucial decision. Global brands may choose to use one of the many English language voices – such as American, English or Australian accents – but as most companies provide customer services using country-specific local rate numbers there is no reason to use the same voice for all countries. Choosing an appropriately localised voice is also crucial for delivering voice services in languages other than English as it ensures the speech engine delivers a native sounding customer experience by appropriately handling language specific pronunciation.

Depending on the voices available in the preferred locale, the tone of the voice service can be further affected by choosing to use a male or female voice. The gender and regional accent of a voice can affect whether it conveys a formal, authoritative tone or a more friendly, assistive character. When trying to improve the customer experience of an automated voice service, it is crucial to experiment early with different voices as the chosen voice not only affects the tone and personality conveyed but in some cases it subsequently affects what customisation options are technically available for the final implementation.

In the same way that a visual interface should apply a brand’s logo, colour scheme and visual identity, so should a voice experience be coherent with the personality of the brand. The voice locale and gender affect the initial user engagement, but the personality of the voice conveys the values and identity of the business whilst also framing the experience for the customer. For example, even when using the selected locale and gender within the same automated voice service, a customer trying to upgrade their account may be handled slightly differently to one trying to cancel their account, yet overall the personality of the voice experience needs to be congruent with the brand identity. Larger companies often define this personality in their ‘tone of voice’ brand guidelines and where available these are an essential asset in creating a better voice experience that appropriately reflects a company’s identity.

2. Understand how voice, speech and language work together

When understanding how to create the best possible automated voice experience, it helps to think about the aspects that make up this human communication that is so familiar to us – and to a company’s customers. Voice and speech are closely related in that they refer to the physical act of constructing the audible communication. Voice, also present in animals, is the production of sounds including laughter, growling or speech, the outcome being affected by attributes such as pitch and volume. Speech, only present in humans, is the construction of spoken language through intelligible, replicable sounds, but also other communicative noises like pronunciation, intonation and accents.

Understanding that a speech synthesis engine essentially simulates human voice and speech ensures that the customisation effort focuses on creating the best sounding voice experience. The language aspect, what is said by the synthesised voice, should be created by a content designer with the same consideration as creating content for the web. Using dynamic content, the customer experience may also be enhanced by observing the user’s name, timely greeting or other personalised, contextual content. Depending on the type of service and the customer’s expectations of the brand, the content designer may consider issues such as adopting a more personal or formal tone, whether to use slang or how much to simplify relevant terminology.

Identifying how the differing aspects of voice communication are related ensures that designers and developers understand how to break down the task of producing an improved voice service and can focus on the parts they can impact most.

3. Curating rather than creating improved voice experiences

Having identified and planned what to say and how to say it, the success of the final execution requires consideration and design of the end-to-end experience. Whilst a typical digital product design process involving user research and various design skills can and should be applied, designing the user experience of an automated voice service can feel more constrained than designing a mobile app or website. Understanding and working within the customisation limitations of the speech synthesis engine requires frequent iteration of the content and voice customisation whilst designing an overall experience that meets both user and business requirements.

The user experience design task can therefore feel more like curation than creation, directing the final voice ‘performance’ to deliver a customer experience that is as close as possible to engaging with a human voice. In addition to typical user experience design tasks such as designing the user journey and interactive menus, the designer must consider varied issues such as the rhythm of speech, how many words a user can retain without needing to replay a message or how specific pronunciations affect the overall experience. For instance, incorrectly pronouncing or accentuating specific phrases such as ‘App Store’ can distract the user and sound unnatural, which at best reduces the effectiveness of the experience or at worst results in the customer ending the call or opting to speak to a more expensive human call operator.

Designers and developers working with Amazon Polly can quickly improve automated voice services by understanding the methods and process for designing better voice experiences.

A better solution for businesses and their customers

Managing and maintaining physical call centres can be a significant management and financial overhead to a business and creates an operational dependency that can be difficult to facilitate amid unexpected scenarios such as the Covid lockdown. The new generation of digital cloud-based technologies therefore present an attractive opportunity to reduce business costs and improve resilience, even if only supplementing rather than fully replacing a traditional call-centre operation. Furthermore, by fully understanding the technology and carefully designing the execution it is possible to provide customers with an automated service that is comparable to the ‘human’ experience whilst offering the benefits of faster response times, better self-servicing and consistent brand engagement.

As with many digital product executions, the smallest of details can significantly affect the overall experience but by establishing a multi-skilled team and understanding how to get the best out of the technology it is possible to maximise the benefits for both a business and their customers. Although the new generation of voice synthesis engines can’t yet fully replicate natural human conversation, a well designed automated voice service can provide a better customer experience than speaking to a real person.