Archive for the ‘VoiceXML for Web Developers’ Category

Want to learn VoiceXML? Check out our “VoiceXML for Web Developers” series…

Friday, November 19th, 2010
Pizza Making

Flickr credit: kubina

Are you looking to learn how to use VoiceXML to create interactive applications? While we offer all sorts of great documentation and tutorials at www.vxml.org, we also have a series of tutorial blog posts here called “VoiceXML for Web Developers” that walk you through the process of getting started, all in the context of creating an application for a fictional pizza restaurant, “Strato Pizza”.

Here is the series so far:

The VoiceXML files used in those tutorials are all available on our Github account at https://github.com/voxeo/Voicexml-samples.

The first introduction article explains the steps you need to go through to set up a free developer account in our Evolution developer portal if you do not already have an account.

I am going to be continuing the series a bit more, so stay tuned for further installments. And please, do let us know (in the comments to this post or via email) how helpful these type of article series are. We have ideas for a few others like this.


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Processing Input (VoiceXML for Web Developers)

Monday, January 4th, 2010

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Today I’m continuing the development of our application for the fictional Strato Pizza. Previously, I asked the caller for their pizza topping preference and their phone number, using both speech recognition and touch tone input. Today I’m going to do something with that input, and repeat the order to the customer.

Within VoiceXML, I can access the values of any fields with <value expr="fieldName$.utterance"/>. This code will return the matched value from my grammar.

Since I want to simply repeat the order and the phone number, I’m going to add a <block> element to my existing form. Inside the block, I’ll add a <prompt> element with the text I want to speak.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza.
      </prompt>
    </block>

When the VoiceXML browser reaches this line, it will speak my text, substituting whatever the caller said in response to the field named topping for topping$.utterance. If the caller asked for ham, the spoken text will be just like if my prompt said, “You ordered ham on your pizza.”

You can use multiple value expressions in a single prompt. I also want to tell the customer that they’ll get a call if there’s a problem with their order. I’ll repeat their phone number to them. Then I’ll thank them for their order and hang up.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza. If we have any questions we will call you at <value expr="phone$.utterance"/>. Thank you for your order.
      </prompt>
    </block>

Remember that for the phone number field, I allowed the caller to use either voice or touch tone input with a built in grammar like so:

    <field name="phone" type="phone">
      Please say or enter your phone number.
    </field>

When I access this value with <value expr="phone$.utterance"/> it doesn’t matter if the caller used voice or DTMF input. The grammar gives the same result. So when I read back the phone number, they’ll hear the digits of their phone number spoken back to them.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Collecting touch tone input (VoiceXML for Web Developers)

Tuesday, December 22nd, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Yesterday, I added the ability for my fictional Strato Pizza order taking application to ask the user what topping they’d like on their pizza. Now I need to ask them for a phone number, in case Strato is out of a topping and needs to call them.

When putting in a phone number, a lot of callers are comfortable with punching in their number on their phone keypads, while others would prefer to simply speak their number. I want my application to behave in the way that’s most comfortable for the caller, so I’m going to handle both methods of input.

First I create my field and validation code:

  <field name="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

I’m doing something a little different with the UI here when someone doesn’t enter or say anything. Instead of giving an error message and replaying the prompt, I’m simply replaying the prompt. In the case of a phone number where we’re accepting DTMF and voice input, saying “I didn’t hear that” seems a little silly. Just asking for the caller’s phone number a second time should suffice.

For a grammar, I could create a grammar consisting of every digit…

<grammar type="text/gsl">
  [one two three four five six seven eight nine zero]
</grammar>

… and to make it work with touch-tone input, add a grammar for DTMF digits …

<grammar type="text/gsl">
  [dtmf-1 dtmf-2 dtmf-3 dtmf-4 dtmf-5 dtmf-6 dtmf-6 dtmf-7 dtmf-8 dtmf-9 dtmf-0]
<grammar>

… but that will only accept a single digit. Now what? I could try to create a grammar that captures every possible combination of digits. For a ten digit phone number, that means I’d have a grammar with ten billion words in it. That doesn’t sound very practical. Or I could ask the user for every digit of their phone number, one digit at a time. Hardly usable. The easiest way to accomplish this is to use a special built-in grammar provided by VoiceXML that accepts a group of digits.

To use this built-in grammar, I simply add a type attribute to my <field> element and tell it the field is intended to hold digits.

<field name="phone" type="digits">

Now the caller can say or key in any number of digits. Since this is a phone number, I don’t want the caller telling me his phone number is “six” so I want to add some restrictions to that. Strato is in the United States, so the caller should enter at least 7 digits and no more than 10.

<field name="phone" type="digits?minlength=7;maxlength=10">

But what if the caller has an extension number to add? I could ask them a separate question to find out if they have an extension. Or I could use a different built-in grammar, one actually designed for phone numbers that already recognizes any 10 digit phone number, including extensions.

<field name="phone" type="phone">

You can see a list of all built-in grammars and different ways of including them in the Built-In Grammar Types VoiceXML documentation.

Because I’m using a built-in grammar for the phone number, I don’t need an additional grammar here. This means my complete field definition looks like this:

  <field name="phone" type="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

This XML snippet will be put into my existing form element, right after the toppings field definition.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.

Next up, I’ll take the user’s input and do something with it.

Reblog this post [with Zemanta]


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for Web Developers: Collecting Input

Monday, December 21st, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Last time out, I createda simple Hello World VoiceXML app that simply answers an incoming call and speaks some text. Now what if we want to add some interactivity and let the caller talk to the application?

Unlike some of the telephony services out there, Voxeo performs speech recognition. Our engine allows someone to punch buttons on their touch tone keypad (known as DTMF, for Dual Tone Multi-Frequency) or to speak to the application using natural language. Why ask your customers to listen to a menu of pizza toppings and remember which number to press when you can just let them say the names of the toppings?

Throughout this series, I’m building an application for Strato Pizza, a fictional pizza chain. In this installment, I’ll ask the caller which topping they’d like. For now, I’m only letting them order a one-topping pizza. Then I’m going to hang up.

The first step in adding either voice recognition or DTMF input is to add an input field to your document. In HTML if you want your user to give you information you use input tags inside a form tag. In VoiceXML you use <field> elements inside a <form> element. Fields have names and just like HTML, you can use those field names to get the values input by the caller. The field name must be a valid JavaScript variable name (so no spaces or dots in the name), and cannot start with an underscore (“_”) or end in a dollar sign (“$”).

Here’s what my form field looks like for asking the caller for their list of pizza toppings.

<form>
  <field name="topping">
    What topping would you like on your pizza?
  </field>
</form>

In my first application, I used <prompt> to speak the text and had to put that prompt element inside a <block> element. Here, I don’t need a block element, because form fields can live directly inside forms. I also don’t need to use prompt – the contents of my field will be spoken to the user and then the application will wait for their response.

For speech recognition to work, I need to provide a list of what the caller is going to say using a grammar. These grammars allow the speech recognition engine to pick out what the user said. Essentially I’m training the recognition engine.

A grammar can have a list of single words, can allow compound words (like “extra cheese”), and can even have synonyms so it understands that Ham and Canadian Bacon are the same thing.

Grammars go inside the body of a <grammar> element. Because you might be using reserved XML characters in your grammar, it’s a good idea to place this inside a CDATA section. The attribute type specifies the MIME type of the grammar file and is required. Grammar file? That sounds like I can use an external file for my grammars. I’ll look into external files in a later installment of this series. For now, I’m using an inline grammar with a type of text/gsl.

<grammar type="text/gsl">
  <![CDATA[
    ;Lines starting with a semicolon are comments.
    ;Match one of the enclosed terms
    [
      ;Terms are separated by a space
      pepperoni olives sausage anchovies

      ;They can also be on separate lines.
      ; Each line is recognized as a separate term
      onions
      peppers

      ;Parentheses require all of the enclosed terms
      ;to be matched. A logical AND
      (extra cheese) (roasted garlic)

      ;Square brackets are the same as OR
      [mushrooms portobello]

      ;You can mix AND &amp; OR together
      [ham (canadian bacon)]
    ]

  ]]>
</grammar>

This grammar applies only to the pizza toppings field, so I’m putting the grammar element inside the “topping” field. There’s other places it can go, but I’ll show those in a later installment. Putting these together, you get:

<form>
  <field name="topping">
    What topping would you like on your pizza?

    <grammar type="text/gsl">
      <![CDATA[
        ;Lines starting with a semicolon are comments.
        ;Match one of the enclosed terms
        [
          ;Terms are separated by a space
          pepperoni olives sausage anchovies

          ;They can also be on separate lines.
          ; Each line is recognized as a separate term
          onions
          peppers

          ;Parentheses require all of the enclosed terms
          ;to be matched. A logical AND
          (extra cheese) (roasted garlic)

          ;Square brackets are the same as OR
          [mushrooms portobello]

          ;You can mix AND & OR together
          [ham (canadian bacon)]
        ]
      ]]>
    </grammar>
  </field>

</form>

Now when someone calls, they can speak their topping and the application will understand it – as long as their topping fits within the grammar I’ve defined. There’s single word toppings like “pepperoni” and “onions” as well as multiple word toppings like “extra cheese.” Because I’ve put parentheses around “extra cheese” the recognizer won’t match if the caller says simply “cheese”. Callers have a tendency to say things you might not expect, like asking for “canadian bacon” instead of just “ham”, so the grammar can handle synonym terms as well.

What if a caller asks for a topping that Strato Pizza doesn’t offer? If Barbara calls up Strato and asks for her favorite potato pizza, my application should now what to do with her request.

On a web form, you generally perform some validation on your form submissions to make sure the user said what you expected them to say. In VoiceXML, I can use the <nomatch> element as a trigger for the caller saying something that doesn’t match the grammar I supplied. Inside the nomatch element, I add a <reprompt/> element to replay the question.

<!-- The caller said something that was not defined in our grammar -->
<nomatch>
  I did not recognize that topping. Please try again.
  <reprompt/>
</nomatch>

In a voice application, I have another type of validation to perform. One that doesn’t happen on the web. In a web form, I can present the user with a form and wait all day for them to fill it out and hit the submit button. But in a voice application, after I ask the caller a question, if they don’t respond, I probably want to ask them again. For this, I can use the <noinput> element to determine what to do when a caller is silent in response to a question. In my noinput I’m going to ask the question again using the reprompt element.

<!-- The caller was silent, restart the field -->
<noinput>
  I did not hear anything.  Please try again.
  <reprompt/>
</noinput>

These two validation elements go inside the form field, just like my grammar did. So now my field looks like this:

<form>
  <field name="topping">
    What topping would you like on your pizza?

    <grammar type="text/gsl">
      <![CDATA[
        ;Lines starting with a semicolon are comments.
        ;Match one of the enclosed terms
        [
          ;Terms are separated by a space
          pepperoni olives sausage anchovies

          ;They can also be on separate lines.
          ; Each line is recognized as a separate term
          onions
          peppers

          ;Parentheses require all of the enclosed terms
          ;to be matched. A logical AND
          (extra cheese) (roasted garlic)

          ;Square brackets are the same as OR
          [mushrooms portobello]

          ;You can mix AND & OR together
          [ham (canadian bacon)]
        ]
      ]]>
    </grammar>
    <!-- The caller was silent, restart the field -->
    <noinput>
      I did not hear anything.  Please try again.
      <reprompt/>
    </noinput>

    <!-- The caller said something that was not defined in our grammar -->
    <nomatch>
      I did not recognize that topping. Please try again.
      <reprompt/>
    </nomatch>
  </field>
</form>

Now my application is able to find out what sort of pizza a caller would like and can handle mistakes, distracted callers, and toppings I don’t have. Adding this to my greeting from the last post, I have:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

<form>
    <block>
    <prompt>
      Thanks for calling Strato Pizza.
    </prompt>
    </block>

    <field name="topping">
      What topping would you like on your pizza?

      <grammar type="text/gsl">
        <![CDATA[
          ;Lines starting with a semicolon are comments.
          ;Match one of the enclosed terms
          [
            ;Terms are separated by a space
            pepperoni olives sausage anchovies

            ;They can also be on separate lines.
            ; Each line is recognized as a separate term
            onions
            peppers

            ;Parentheses require all of the enclosed terms
            ;to be matched. A logical AND
            (extra cheese) (roasted garlic)

            ;Square brackets are the same as OR
            [mushrooms portobello]

            ;You can mix AND & OR together
            [ham (canadian bacon)]
          ]
        ]]>
      </grammar>
      <!-- The caller was silent, restart the field -->
      <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>

      <!-- The caller said something that was not defined in our grammar -->
      <nomatch>
        I did not recognize that topping. Please try again.
        <reprompt/>
      </nomatch>
    </field>
  </form>

</vxml>

The next requirement for my application is to collect the caller’s phone number so Strato can call if there’s a problem with the order. I’ll take a look at that tomorrow in my next blog post.


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for web developers: Hello World

Thursday, December 17th, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you missed it, in the first installment of this series I created an application on Evolution and assigned it some phone numbers. For the rest of the series, I’ll be using that application to test my VoiceXML apps. If you want to follow along, go create your own Evolution account.

I’m going to start simple with my first application – just answer and speak some text, then hang up. This way we can get a look at the syntax needed for VoiceXML. Throughout this series, I’ll be building an application for Strato Pizza, a fictional pizza chain. My application here is simply a greeting played when someone calls the chain’s phone number.

As the name implies, VoiceXML is written in XML. So I start with an XML declaration and tell the browser what character encoding to use, just like any other XML document. Then I create a <vxml> element that will hold the application.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

</vxml>

Inside this element I need a couple of structural elements. <form> is a container that separates different areas of input and output, sort of like different HTML forms and pages. <block> is a container that allows you to conditionally execute code. Although I’m not creating separate inputs and outputs or trying to conditionally execute code, these elements are still needed, since the next elements I’m going to create are required to be inside a <block> and a block must be inside a <form>. Since I’m not using them for anything, I don’t have to worry about any attributes right now.

Now my VoiceXML document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>

    </block>
  </form>
</vxml>

Great, now the basic structure is in place and I can put in the meat of the application. All I want to do is say something and hang up, so my application is pretty simple. I can say something by using a <prompt> element and the VoiceXML browser will perform text to speech and say whatever I typed.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>
    <prompt>
      Thanks for calling Strato Pizza.
    </prompt>
    </block>
  </form>
</vxml>

That’s it. The whole document. I upload the document to my web server at the URL that I configured my application with in Evolution. When I call this application using the Skype number supplied in Evolution, a text to speech (TTS) engine speaks my text.


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for Web developers: Introduction

Tuesday, December 15th, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

I’ll admit it. Before joining Voxeo, I wasn’t much of a voice guy. I’m a web guy. I was pretty sure that voice applications were created through witchcraft. Turns out, there’s no magic involved, just some standards and markup languages. If you can create a web app, you can create a voice app. Voxeo has some great developer documentation and detailed tutorials available through Evolution, our developer portal. Over the next few weeks, I’ll be walking through some examples as I learn, from the perspective of a web developer, VoiceXML, CCXML, and Voxeo’s own CallXML.

I’ll start with VoiceXML. VoiceXML is a W3C standard, just like HTML is. Like HTML, your code is executed in a browser, but instead of a visual browser on a computer screen, in this case it’s a voice browser that you use over the telephone. To test out any of the samples I’m going to create, I’m going to need a VoiceXML browser attached to the telephone network. Voxeo provides developers with free accounts and a phone number so you can build and test your app. You’ll also need a web server to host your XML file, but Voxeo will provide some hosting space for you for free if you’d like.

Go over to Evolution and create an account. Then go to the Application Manager.

App Manager

Create a new application and call it anything you’d like. Then decide how you want your app to work. For now, I’m only using voice, so I don’t need text messaging. I can always add it in the future if I change my mind.

Creating an app

I need to tell Evolution where my VoiceXML file is at by providing a URL for it. Since I’m going to create a Hello World application and host it on my own server, I’m putting in the URL I intend to use for my VoiceXML file. Again, I can change this later if I decide on a different file name or path.

Creating an app, step 2

After I create my app, I have a new tab at the top of the page that gives me some phone numbers I can use to call my application.

app created

Clicking on that tab reveals a local number, a toll-free number, and numbers to call from Skype, SIP, and iNum providers. I can also add a dedicated local number if I’d like. Since I’m going to test with Skype, I don’t need a local number, but if you’re testing from your phone, grab one.

contact numbers

And that’s it. I now have a VoiceXML browser hooked up to the telephone network that I can use to test my application. In my next post a couple of days from now, I’ll create my first app.

Reblog this post [with Zemanta]


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.