This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.
If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.
Yesterday, I added the ability for my fictional Strato Pizza order taking application to ask the user what topping they’d like on their pizza. Now I need to ask them for a phone number, in case Strato is out of a topping and needs to call them.
When putting in a phone number, a lot of callers are comfortable with punching in their number on their phone keypads, while others would prefer to simply speak their number. I want my application to behave in the way that’s most comfortable for the caller, so I’m going to handle both methods of input.
First I create my field and validation code:
<field name="phone">
Please say or enter your phone number.
<noinput>
<reprompt>
</reprompt>
<nomatch>
I didn't understand that. Please try again.
<reprompt>
</reprompt>
</nomatch>
I’m doing something a little different with the UI here when someone doesn’t enter or say anything. Instead of giving an error message and replaying the prompt, I’m simply replaying the prompt. In the case of a phone number where we’re accepting DTMF and voice input, saying “I didn’t hear that” seems a little silly. Just asking for the caller’s phone number a second time should suffice.
For a grammar, I could create a grammar consisting of every digit…
<grammar type="text/gsl">
[one two three four five six seven eight nine zero]
</grammar>
… and to make it work with touch-tone input, add a grammar for DTMF digits …
<grammar type="text/gsl">
[dtmf-1 dtmf-2 dtmf-3 dtmf-4 dtmf-5 dtmf-6 dtmf-6 dtmf-7 dtmf-8 dtmf-9 dtmf-0]
<grammar>
… but that will only accept a single digit. Now what? I could try to create a grammar that captures every possible combination of digits. For a ten digit phone number, that means I’d have a grammar with ten billion words in it. That doesn’t sound very practical. Or I could ask the user for every digit of their phone number, one digit at a time. Hardly usable. The easiest way to accomplish this is to use a special built-in grammar provided by VoiceXML that accepts a group of digits.
To use this built-in grammar, I simply add a type attribute to my <field> element and tell it the field is intended to hold digits.
<field name="phone" type="digits">
Now the caller can say or key in any number of digits. Since this is a phone number, I don’t want the caller telling me his phone number is “six” so I want to add some restrictions to that. Strato is in the United States, so the caller should enter at least 7 digits and no more than 10.
<field name="phone" type="digits?minlength=7;maxlength=10">
But what if the caller has an extension number to add? I could ask them a separate question to find out if they have an extension. Or I could use a different built-in grammar, one actually designed for phone numbers that already recognizes any 10 digit phone number, including extensions.
<field name="phone" type="phone">
You can see a list of all built-in grammars and different ways of including them in the Built-In Grammar Types VoiceXML documentation.
Because I’m using a built-in grammar for the phone number, I don’t need an additional grammar here. This means my complete field definition looks like this:
<field name="phone" type="phone">
Please say or enter your phone number.
<noinput>
<reprompt>
</reprompt>
<nomatch>
I didn't understand that. Please try again.
<reprompt>
</reprompt>
</nomatch>
This XML snippet will be put into my existing form element, right after the toppings field definition.
You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.
Next up, I’ll take the user’s input and do something with it.