As a voice user experience designer, I’ve worked with Fortune 500 companies to help improve their speech applications. In this article, I want to share my experience on what are some of the things you should think about when designing for voice.
The rise of voice interfaces are pretty obvious as every single major company rolls out their idea of a virtual physical assistant. With voice interfaces, all of a sudden we can search, send messages, even control our connected devices seamlessly — all by our natural voice. But there’s a catch — according to VoiceLabs, 69 percent of the 7,000-plus Alexa “Skills” — voice apps, if you will — have zero or one customer review, signaling low usage. Now that’s really not good statistics. So why is that? I believe VentureBeat has said it best:
“ We are not creating conversations, we are building old-school commands hidden behind voice requests.”
With these voice interfaces becoming more and more common, the need to make them more conversational and user-centered is significant. In this article, I’ll explain my process for voice design based on my experience of working with clients from variety of industries from telecommunications to retails.
What is a Voice User Interface?
Let’s visit the concept of what exactly a voice user interface is. A Voice User Interface, or VUI (pronounced “voo-e”), is simply an application that the user interacts with by communicating vocally. Most of us are familiar with voice interfaces by interacting with automated phone systems. Sadly, a lot of phone systems have a very badly designed interface. There’s a reason behind this, and it’s because the developers who design these systems don’t understand how to design for voice experience.
Some of the reasons why you have such bad experience with voice devices:
- They lack context in speech and not truly conversational in nature.
- They’re designed to act as, “information collector.”
- The dialogues are spoken the way we write and not speak.
It’s important to realize that it’s usually not the technology that’s causing a bad experience but rather the design interfaces you’re interacting with. Let’s look at some ways we can get started on designing a great experience for your voice devices.
Everything has a personality, even your bots and voice devices. I remember a friend who once told me that he prefers Amazon Alexa over Google Home since he can wake Alexa up by using a name rather than OK Google. He was referring to its persona and it is relevant as technology tries its best to connect with its users.
Clifford Nass, who was a professor of Communications at Stanford University, and a renowned authority on human-computer interaction claimed in his book, The Media Equation:
People tend to treat computers and other media as if they were either real people or real places. (The Media Equation)
We’re emotional beings and tend to bring that association to everything. Don Norman, a giant in the field of Human-Computer- Interaction, even wrote a book on this called Emotional Design. Norman states:
People can more easily relate to a product, a service, a system, or an
experience when they are able to connect with it at a personal level. (Emotional Design)
As you can tell, a great user experience is not about just making usable or functional apps but also about creating an emotion. Likewise, a great persona in voice is not about just having a pretty voice. It’s also about connecting with the user on the other end. When we hear a voice, we unconsciously make a lot of assumptions about that person. These assumptions include how intelligent that person might be or which region or country they’re from.
A persona or personality can be thought as a character in the voice user interface world, just the same way in a film or book. It’s one way a company can brand itself, for it’s an extension of the brand itself. Therefore, it’s very important to pick the persona very carefully. I remember running a usability study with a major healthcare brand and hearing comments from participants stating, “That voice sounds too happy. I’m calling to refill a prescription. Why does she sound too happy?” We eventually had to coach the voice talent to adjust the prompts to match the proper tone. One common misunderstanding is you can hire a voice actor with a pleasant voice and that’d be the end of persona creation.
Creating a well-crafted persona takes time and research. According to the book, Voice User Interface Design, here are some of the things one should think about when designing a persona for your voice interface.
- The Role: What’s the role of the application to the user? In other words, is it an assistant that the user is familiar with that gives advice on stock options? Is it a bank clerk?
- Company Brand / Image: The persona that you pick for the system or application should be at least compatible with the brand or company’s image.
- Familiarity & Target Audience: Your persona should be familiar to your users. Therefore, for a compelling persona, we need to consider demographics, attitudes, the frequency of usage and the lifestyle of the user. A persona that works well in one culture might not work in other.
Designing the Blueprint
While it’s very tempting to design your voice experience using a flow-chart, it’s not the best way to achieve a great experience. Although it’s important to have a flow-chart explaining the workflow or information architecture, the conversational design prompts shouldn’t be focused on that. Remember that speech happens in a context and is not strictly based on logic.
How many automated systems have you heard with prompts such as, ‘If you want to return to the main menu, press 1’. This is not recommended since we tend to move conversations forward and not backward. For example, if I ask you to repeat something, you don’t go back and retrieve it. Instead, you simply state differently — i.e., “Sure, here’s it again”. The more I ask you to repeat, the more differently you explain. The ingredient to make a design naturally conversational is understanding linguistics. In a conversational design interface, the focus is on building an interaction based on the way people speak, not how they write. A lot of voice interfaces are written in the way we write rather than the way we speak.
Designing a VUI dialog
In voice interfaces, we don’t have wireframes; we have sample dialogues. But before we get to that, let’s start with user stories. Start by describing scenarios in which your users will find useful. Identify major scenarios to describe the purpose of your interface and ways your users can interact with them.
Next, start crafting a few sample dialogs for your scenarios. A sample dialog is a script that shows a conversational flow between the system and the user. You can think of this as the VUI version of a wireframing process. A sample dialog can be something you show your client to get feedback before you start prototyping.
A Sample Dialog
Here’s an example of what a sample dialogue might look like:
Caller using the phone to make a payment; payment info stored.
System: Thanks for calling BankOMatic. I looked up your number and found an account. I see you have a payment that’s due today. Are you calling about that?
System: OK. For security, tell me the last four digits of your Social Security number.
System: Thanks, let me look it up.
Found it. You have a payment of $40.98 that’s due today.
Would you like to make a payment now?
System: All right, I see you have a VISA card ending in 789 on file; would you like me to use that?
Caller: Yes, please.
System: Just a second… All done! I’ve put that through. Just so you know, it might take up to 48 hours to show up on your account. Now, if that’s all you needed, feel free to hang up, and thanks for using BankOMatic.
While the dialog above is a happy journey, you should also write for not-so happy scenarios such as, what happens if the user says something out-of-context, error progression, etc. We’ll talk more about errors later on. You probably notice the use of contraction in the dialogue above. In a conversation, we tend to use contractions (i.e. ‘You’re,’ ‘I’m’). And it’s not because we’re lazy. There are some things that would sound really odd if we don’t use contraction. For example, how would you say the following without the use of contraction?
- “Wasn’t John’s presentation great?” → “Was not John’s presentation great?”
The question above makes you want to say, “Yes, your honor. It was indeed a great one!”
Whilst we do this unconsciously, it’s important to realize once again — the way we speak is different from the way we write, so please do keep that in mind when designing your VUIs.
Another idea to keep in mind is what’s called a discourse marker. A discourse marker is a word or phrase that you can use to connect and arrange what you say or write (i.e. ‘anyway,’ ‘now’, etc.). In the dialogue above, I have used the word ‘Now’ to make an obvious transition from one idea to another.
You see a lot of speech systems these days using technical jargon or words that are often only familiar to engineers and developers. Let’s take this example:
System: Your request has been processed.
How would you rewrite in everyday language? Here’s one way:
System: Done! You’re all set!
In early 2017, Google published In Conversation, There Are No Errors , and according to the article, one should think of errors as opportunities to create meaningful conversations. I’ll let you read the article on principles behind handling errors on your own time. For now, I’ll just jump into some examples.
The rapid re-prompt approach doesn’t provide detailed information right away. This is similar to the kinds of statements people might use in a typical conversation to show that they didn’t really understand the speaker:
- “What was that?”
- “Say it again?”
- “I’m sorry?”
From my personal experience of running usability studies with clients from variety of industries, I’ve found that escalating errors can save time and task completion; especially for power users.
System: What’s your date of birth?
System: Just tell me your date of birth using 2 digits for the month, 2 digits for the day, and 4 for the year.
As you can see, the error strategy started escalating from general to specific rather than just giving all of the information right away. It’s also great for power users who are used to hearing the prompts so many times.
When writing prompts and error strategies, consider being cooperative by applying Grice’s Maxims. Not only this will help the user experience, but would also create confidence and trust in your users when interacting with the device.
- The maxim of quantity — try to be as informative as one possibly can, and give as much information as is needed, and no more.
- The maxim of quality — try to be truthful, and don’t give information that’s false or not supported by evidence.
- The maxim of relation — try to be relevant, and say things that are pertinent to the discussion.
- The maxim of manner — try to be as clear; avoid obscurity and ambiguity.
While there are many ways you can design a great experience for voice, remember that the end focus is on users like you and I. So, it’s important to understand not only the target users but also the context in which the dialogues will appear.
In other words, study the way we speak and write. Study user-centered design processes and learn how to approach the challenge humanly. It’s usually not the limitation of speech technology that’s responsible for a horrible voice experience. It’s usually the designers not knowing how to apply these processes that result in a less-than-desirable voice interface. Hopefully, this article will help you design for mere mortals.