Posts Tagged ‘web’

Processing Input (VoiceXML for Web Developers)

Monday, January 4th, 2010

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Today I’m continuing the development of our application for the fictional Strato Pizza. Previously, I asked the caller for their pizza topping preference and their phone number, using both speech recognition and touch tone input. Today I’m going to do something with that input, and repeat the order to the customer.

Within VoiceXML, I can access the values of any fields with <value expr="fieldName$.utterance"/>. This code will return the matched value from my grammar.

Since I want to simply repeat the order and the phone number, I’m going to add a <block> element to my existing form. Inside the block, I’ll add a <prompt> element with the text I want to speak.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza.
      </prompt>
    </block>

When the VoiceXML browser reaches this line, it will speak my text, substituting whatever the caller said in response to the field named topping for topping$.utterance. If the caller asked for ham, the spoken text will be just like if my prompt said, “You ordered ham on your pizza.”

You can use multiple value expressions in a single prompt. I also want to tell the customer that they’ll get a call if there’s a problem with their order. I’ll repeat their phone number to them. Then I’ll thank them for their order and hang up.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza. If we have any questions we will call you at <value expr="phone$.utterance"/>. Thank you for your order.
      </prompt>
    </block>

Remember that for the phone number field, I allowed the caller to use either voice or touch tone input with a built in grammar like so:

    <field name="phone" type="phone">
      Please say or enter your phone number.
    </field>

When I access this value with <value expr="phone$.utterance"/> it doesn’t matter if the caller used voice or DTMF input. The grammar gives the same result. So when I read back the phone number, they’ll hear the digits of their phone number spoken back to them.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Collecting touch tone input (VoiceXML for Web Developers)

Tuesday, December 22nd, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Yesterday, I added the ability for my fictional Strato Pizza order taking application to ask the user what topping they’d like on their pizza. Now I need to ask them for a phone number, in case Strato is out of a topping and needs to call them.

When putting in a phone number, a lot of callers are comfortable with punching in their number on their phone keypads, while others would prefer to simply speak their number. I want my application to behave in the way that’s most comfortable for the caller, so I’m going to handle both methods of input.

First I create my field and validation code:

  <field name="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

I’m doing something a little different with the UI here when someone doesn’t enter or say anything. Instead of giving an error message and replaying the prompt, I’m simply replaying the prompt. In the case of a phone number where we’re accepting DTMF and voice input, saying “I didn’t hear that” seems a little silly. Just asking for the caller’s phone number a second time should suffice.

For a grammar, I could create a grammar consisting of every digit…

<grammar type="text/gsl">
  [one two three four five six seven eight nine zero]
</grammar>

… and to make it work with touch-tone input, add a grammar for DTMF digits …

<grammar type="text/gsl">
  [dtmf-1 dtmf-2 dtmf-3 dtmf-4 dtmf-5 dtmf-6 dtmf-6 dtmf-7 dtmf-8 dtmf-9 dtmf-0]
<grammar>

… but that will only accept a single digit. Now what? I could try to create a grammar that captures every possible combination of digits. For a ten digit phone number, that means I’d have a grammar with ten billion words in it. That doesn’t sound very practical. Or I could ask the user for every digit of their phone number, one digit at a time. Hardly usable. The easiest way to accomplish this is to use a special built-in grammar provided by VoiceXML that accepts a group of digits.

To use this built-in grammar, I simply add a type attribute to my <field> element and tell it the field is intended to hold digits.

<field name="phone" type="digits">

Now the caller can say or key in any number of digits. Since this is a phone number, I don’t want the caller telling me his phone number is “six” so I want to add some restrictions to that. Strato is in the United States, so the caller should enter at least 7 digits and no more than 10.

<field name="phone" type="digits?minlength=7;maxlength=10">

But what if the caller has an extension number to add? I could ask them a separate question to find out if they have an extension. Or I could use a different built-in grammar, one actually designed for phone numbers that already recognizes any 10 digit phone number, including extensions.

<field name="phone" type="phone">

You can see a list of all built-in grammars and different ways of including them in the Built-In Grammar Types VoiceXML documentation.

Because I’m using a built-in grammar for the phone number, I don’t need an additional grammar here. This means my complete field definition looks like this:

  <field name="phone" type="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

This XML snippet will be put into my existing form element, right after the toppings field definition.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.

Next up, I’ll take the user’s input and do something with it.

Reblog this post [with Zemanta]


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for web developers: Hello World

Thursday, December 17th, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you missed it, in the first installment of this series I created an application on Evolution and assigned it some phone numbers. For the rest of the series, I’ll be using that application to test my VoiceXML apps. If you want to follow along, go create your own Evolution account.

I’m going to start simple with my first application – just answer and speak some text, then hang up. This way we can get a look at the syntax needed for VoiceXML. Throughout this series, I’ll be building an application for Strato Pizza, a fictional pizza chain. My application here is simply a greeting played when someone calls the chain’s phone number.

As the name implies, VoiceXML is written in XML. So I start with an XML declaration and tell the browser what character encoding to use, just like any other XML document. Then I create a <vxml> element that will hold the application.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

</vxml>

Inside this element I need a couple of structural elements. <form> is a container that separates different areas of input and output, sort of like different HTML forms and pages. <block> is a container that allows you to conditionally execute code. Although I’m not creating separate inputs and outputs or trying to conditionally execute code, these elements are still needed, since the next elements I’m going to create are required to be inside a <block> and a block must be inside a <form>. Since I’m not using them for anything, I don’t have to worry about any attributes right now.

Now my VoiceXML document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>

    </block>
  </form>
</vxml>

Great, now the basic structure is in place and I can put in the meat of the application. All I want to do is say something and hang up, so my application is pretty simple. I can say something by using a <prompt> element and the VoiceXML browser will perform text to speech and say whatever I typed.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>
    <prompt>
      Thanks for calling Strato Pizza.
    </prompt>
    </block>
  </form>
</vxml>

That’s it. The whole document. I upload the document to my web server at the URL that I configured my application with in Evolution. When I call this application using the Skype number supplied in Evolution, a text to speech (TTS) engine speaks my text.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for Web developers: Introduction

Tuesday, December 15th, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

I’ll admit it. Before joining Voxeo, I wasn’t much of a voice guy. I’m a web guy. I was pretty sure that voice applications were created through witchcraft. Turns out, there’s no magic involved, just some standards and markup languages. If you can create a web app, you can create a voice app. Voxeo has some great developer documentation and detailed tutorials available through Evolution, our developer portal. Over the next few weeks, I’ll be walking through some examples as I learn, from the perspective of a web developer, VoiceXML, CCXML, and Voxeo’s own CallXML.

I’ll start with VoiceXML. VoiceXML is a W3C standard, just like HTML is. Like HTML, your code is executed in a browser, but instead of a visual browser on a computer screen, in this case it’s a voice browser that you use over the telephone. To test out any of the samples I’m going to create, I’m going to need a VoiceXML browser attached to the telephone network. Voxeo provides developers with free accounts and a phone number so you can build and test your app. You’ll also need a web server to host your XML file, but Voxeo will provide some hosting space for you for free if you’d like.

Go over to Evolution and create an account. Then go to the Application Manager.

App Manager

Create a new application and call it anything you’d like. Then decide how you want your app to work. For now, I’m only using voice, so I don’t need text messaging. I can always add it in the future if I change my mind.

Creating an app

I need to tell Evolution where my VoiceXML file is at by providing a URL for it. Since I’m going to create a Hello World application and host it on my own server, I’m putting in the URL I intend to use for my VoiceXML file. Again, I can change this later if I decide on a different file name or path.

Creating an app, step 2

After I create my app, I have a new tab at the top of the page that gives me some phone numbers I can use to call my application.

app created

Clicking on that tab reveals a local number, a toll-free number, and numbers to call from Skype, SIP, and iNum providers. I can also add a dedicated local number if I’d like. Since I’m going to test with Skype, I don’t need a local number, but if you’re testing from your phone, grab one.

contact numbers

And that’s it. I now have a VoiceXML browser hooked up to the telephone network that I can use to test my application. In my next post a couple of days from now, I’ll create my first app.

Reblog this post [with Zemanta]


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.