Posts Tagged ‘VoiceXML’

Using Biometrics in your Voice Applications

Wednesday, February 10th, 2010

How do you authenticate your callers before giving them access to confidential information? What if your application could recognize the caller’s voice? Voxeo has partnered with four of the leading voice biometrics suppliers to make implementing this technology in your VoiceXML application easier.

We’ve written a how-to guide for each biometrics vendor, showing the steps required to get set up with their platform and put together sample code for integrating biometrics into your VoiceXML app.

The intent of these guides and the trial accounts that our partners are offering is to introudce developers to voice biometrics on Voxeo’s platforms and demonstrate voice verification services with each vendor.

With each vendor, the general process is to apply for a developer or trial account and then use your account information in the sample VoiceXML applications that we’ll give you. You’re welcome to explore the documentation from each vendor to create more complex cases and to try biometrics in your own applications.

There’s two steps that your application will need to perform: enrollment and verification. Enrollment sets up a user in the biometrics platform and stores their voiceprint for future identification. Verification is the step performed when you want to check a caller against a previously-stored voiceprint.

The sample application we provide here is a simple use case. A caller calls in and our application uses their caller ID as the account number. We’ll start enrollment, and if the caller is successfully enrolled, we’ll start verification against this new voiceprint, asking for their password.  Obviously your real biometrics application can be much more complex. For instance, normally you would store the enrollment status of the caller and only start enrollment if they hadn’t previously enrolled. But this simple demo application should give you an idea of the basic steps required to add a similar biometrics feature from each vendor.

Each of the included examples uses a similar process for connecting your application to the biometrics service. The call is processed by your VoiceXML application and you either send data to a remote server using the <data> element or you transfer control of the call to a subdialog hosted on the remote server.

For the some vendors, your caller’s voice is recorded on your server and then the voice file is transmitted to the biometrics server. It makes decisions and passes the results back to you, allowing you to notify your caller. All interaction passes through your application.

biometrics-passthrough

Other vendors use subdialogs to record and process the voiceprints with your caller’s voice transmitted directly to the biometrics server. You choose when to hand off control of the call and the biometrics server gives it back to you when it’s done.

biometrics-subdialog

The end result is the same, however, and your caller won’t notice the difference.

To try biometrics in your application, visit the Voxeo Biometrics page for details on each vendor and a brief guide on how to get started.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Processing Input (VoiceXML for Web Developers)

Monday, January 4th, 2010

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Today I’m continuing the development of our application for the fictional Strato Pizza. Previously, I asked the caller for their pizza topping preference and their phone number, using both speech recognition and touch tone input. Today I’m going to do something with that input, and repeat the order to the customer.

Within VoiceXML, I can access the values of any fields with <value expr="fieldName$.utterance"/>. This code will return the matched value from my grammar.

Since I want to simply repeat the order and the phone number, I’m going to add a <block> element to my existing form. Inside the block, I’ll add a <prompt> element with the text I want to speak.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza.
      </prompt>
    </block>

When the VoiceXML browser reaches this line, it will speak my text, substituting whatever the caller said in response to the field named topping for topping$.utterance. If the caller asked for ham, the spoken text will be just like if my prompt said, “You ordered ham on your pizza.”

You can use multiple value expressions in a single prompt. I also want to tell the customer that they’ll get a call if there’s a problem with their order. I’ll repeat their phone number to them. Then I’ll thank them for their order and hang up.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza. If we have any questions we will call you at <value expr="phone$.utterance"/>. Thank you for your order.
      </prompt>
    </block>

Remember that for the phone number field, I allowed the caller to use either voice or touch tone input with a built in grammar like so:

    <field name="phone" type="phone">
      Please say or enter your phone number.
    </field>

When I access this value with <value expr="phone$.utterance"/> it doesn’t matter if the caller used voice or DTMF input. The grammar gives the same result. So when I read back the phone number, they’ll hear the digits of their phone number spoken back to them.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Collecting touch tone input (VoiceXML for Web Developers)

Tuesday, December 22nd, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Yesterday, I added the ability for my fictional Strato Pizza order taking application to ask the user what topping they’d like on their pizza. Now I need to ask them for a phone number, in case Strato is out of a topping and needs to call them.

When putting in a phone number, a lot of callers are comfortable with punching in their number on their phone keypads, while others would prefer to simply speak their number. I want my application to behave in the way that’s most comfortable for the caller, so I’m going to handle both methods of input.

First I create my field and validation code:

  <field name="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

I’m doing something a little different with the UI here when someone doesn’t enter or say anything. Instead of giving an error message and replaying the prompt, I’m simply replaying the prompt. In the case of a phone number where we’re accepting DTMF and voice input, saying “I didn’t hear that” seems a little silly. Just asking for the caller’s phone number a second time should suffice.

For a grammar, I could create a grammar consisting of every digit…

<grammar type="text/gsl">
  [one two three four five six seven eight nine zero]
</grammar>

… and to make it work with touch-tone input, add a grammar for DTMF digits …

<grammar type="text/gsl">
  [dtmf-1 dtmf-2 dtmf-3 dtmf-4 dtmf-5 dtmf-6 dtmf-6 dtmf-7 dtmf-8 dtmf-9 dtmf-0]
<grammar>

… but that will only accept a single digit. Now what? I could try to create a grammar that captures every possible combination of digits. For a ten digit phone number, that means I’d have a grammar with ten billion words in it. That doesn’t sound very practical. Or I could ask the user for every digit of their phone number, one digit at a time. Hardly usable. The easiest way to accomplish this is to use a special built-in grammar provided by VoiceXML that accepts a group of digits.

To use this built-in grammar, I simply add a type attribute to my <field> element and tell it the field is intended to hold digits.

<field name="phone" type="digits">

Now the caller can say or key in any number of digits. Since this is a phone number, I don’t want the caller telling me his phone number is “six” so I want to add some restrictions to that. Strato is in the United States, so the caller should enter at least 7 digits and no more than 10.

<field name="phone" type="digits?minlength=7;maxlength=10">

But what if the caller has an extension number to add? I could ask them a separate question to find out if they have an extension. Or I could use a different built-in grammar, one actually designed for phone numbers that already recognizes any 10 digit phone number, including extensions.

<field name="phone" type="phone">

You can see a list of all built-in grammars and different ways of including them in the Built-In Grammar Types VoiceXML documentation.

Because I’m using a built-in grammar for the phone number, I don’t need an additional grammar here. This means my complete field definition looks like this:

  <field name="phone" type="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

This XML snippet will be put into my existing form element, right after the toppings field definition.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.

Next up, I’ll take the user’s input and do something with it.

Reblog this post [with Zemanta]


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for web developers: Hello World

Thursday, December 17th, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you missed it, in the first installment of this series I created an application on Evolution and assigned it some phone numbers. For the rest of the series, I’ll be using that application to test my VoiceXML apps. If you want to follow along, go create your own Evolution account.

I’m going to start simple with my first application – just answer and speak some text, then hang up. This way we can get a look at the syntax needed for VoiceXML. Throughout this series, I’ll be building an application for Strato Pizza, a fictional pizza chain. My application here is simply a greeting played when someone calls the chain’s phone number.

As the name implies, VoiceXML is written in XML. So I start with an XML declaration and tell the browser what character encoding to use, just like any other XML document. Then I create a <vxml> element that will hold the application.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

</vxml>

Inside this element I need a couple of structural elements. <form> is a container that separates different areas of input and output, sort of like different HTML forms and pages. <block> is a container that allows you to conditionally execute code. Although I’m not creating separate inputs and outputs or trying to conditionally execute code, these elements are still needed, since the next elements I’m going to create are required to be inside a <block> and a block must be inside a <form>. Since I’m not using them for anything, I don’t have to worry about any attributes right now.

Now my VoiceXML document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>

    </block>
  </form>
</vxml>

Great, now the basic structure is in place and I can put in the meat of the application. All I want to do is say something and hang up, so my application is pretty simple. I can say something by using a <prompt> element and the VoiceXML browser will perform text to speech and say whatever I typed.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>
    <prompt>
      Thanks for calling Strato Pizza.
    </prompt>
    </block>
  </form>
</vxml>

That’s it. The whole document. I upload the document to my web server at the URL that I configured my application with in Evolution. When I call this application using the Skype number supplied in Evolution, a text to speech (TTS) engine speaks my text.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for Web developers: Introduction

Tuesday, December 15th, 2009

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

I’ll admit it. Before joining Voxeo, I wasn’t much of a voice guy. I’m a web guy. I was pretty sure that voice applications were created through witchcraft. Turns out, there’s no magic involved, just some standards and markup languages. If you can create a web app, you can create a voice app. Voxeo has some great developer documentation and detailed tutorials available through Evolution, our developer portal. Over the next few weeks, I’ll be walking through some examples as I learn, from the perspective of a web developer, VoiceXML, CCXML, and Voxeo’s own CallXML.

I’ll start with VoiceXML. VoiceXML is a W3C standard, just like HTML is. Like HTML, your code is executed in a browser, but instead of a visual browser on a computer screen, in this case it’s a voice browser that you use over the telephone. To test out any of the samples I’m going to create, I’m going to need a VoiceXML browser attached to the telephone network. Voxeo provides developers with free accounts and a phone number so you can build and test your app. You’ll also need a web server to host your XML file, but Voxeo will provide some hosting space for you for free if you’d like.

Go over to Evolution and create an account. Then go to the Application Manager.

App Manager

Create a new application and call it anything you’d like. Then decide how you want your app to work. For now, I’m only using voice, so I don’t need text messaging. I can always add it in the future if I change my mind.

Creating an app

I need to tell Evolution where my VoiceXML file is at by providing a URL for it. Since I’m going to create a Hello World application and host it on my own server, I’m putting in the URL I intend to use for my VoiceXML file. Again, I can change this later if I decide on a different file name or path.

Creating an app, step 2

After I create my app, I have a new tab at the top of the page that gives me some phone numbers I can use to call my application.

app created

Clicking on that tab reveals a local number, a toll-free number, and numbers to call from Skype, SIP, and iNum providers. I can also add a dedicated local number if I’d like. Since I’m going to test with Skype, I don’t need a local number, but if you’re testing from your phone, grab one.

contact numbers

And that’s it. I now have a VoiceXML browser hooked up to the telephone network that I can use to test my application. In my next post a couple of days from now, I’ll create my first app.

Reblog this post [with Zemanta]


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


How to call your VoiceXML, CCXML or CallXML app directly from Skype

Monday, November 23rd, 2009

When I wrote last week about how you could call a Tropo.com app directly from Skype, it naturally begged the question – can you also call a VoiceXML, CCXML, or CallXML app via Skype?

The answer is… of course!

We have had this feature in our platform for years, as I discussed in a post back in March 2008 called “Skype-ifying your voice applications“. Skype numbers (and SIP addresses and iNum numbers) are automagically assigned to your application when your create it. Once you login to our Evolution developer portal, simply click on “Application Manager” and then the name of one of your applications.

You’ll then see the “Applications Settings” information and by simply clicking on the “Contact Methods” tab you will see all the contact numbers available to you:

vxmlwithskype.jpg

You also have the ability to add more numbers if you want additional direct numbers (DIDs) associated with your app.

As I mentioned in the Tropo blog post, you can copy/paste that Skype number into Skype and call away… with or without the space in the number. You can try it out by calling:

+990009369991439407

Now, that is just a “Hello, world” type of app that is not nearly as exciting as the Yahoo!Weather app referenced in the Tropo blog post, but you get the idea.

That’s it! Create an app… call it from Skype. Nice and simple.

If you’d like to try it out, you can just head over to Evolution and sign up for free developer account if you don’t already have one. (And yes, we give you free direct DIDs, free Skype access, free SIP access, etc., etc.)


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Make your existing VoiceXML (and CCXML and CallXML) apps multi-channel: add SMS and IM today…

Tuesday, August 25th, 2009

With our announcement of Prophecy 10 today, you can now add SMS and instant messaging (IM) to any existing VoiceXML, CallXML or CCXML application. When you login to our Evolution developer portal (and you can sign up for a free account if you don’t have one) and go into the settings for one of your applications through the Application Manager, you will now see that you can choose to make an application a voice application, a text-messaging application, or both:

voicexmlsmsim.jpg

Once you have added text messaging capability to your application, you can switch to the Contact Methods tab where you can add an SMS-enabled phone number and/or attach IM IDs to the application:

evoconfigsmsim.jpg

Now you have one application accessible via multiple channels. We’ll write more in the next bit about how to tweak your apps to work with multiple channels… but for those who want to login right now and get started, go right ahead!

P.S. Note you can also do this with Tropo.com applications as well


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Code Walk: Listening to Identi.ca (OSCON 2008 Demo #2: VoiceXML)

Tuesday, January 27th, 2009

Back in the summer of 2008, I gave a presentation at O’Reilly’s Open Source Convention (OSCON) about “Mashing Up Voice and the Web Using Open Source and XML” where I talked primarily about integrating voice with the Identi.ca microblogging service. While I made the slides from that talk available previously, I only made one of my mashup demos available here in this Voxeo Developer’s Corner blog (Demo #1: Is Twitter Down?) So I want to change that and start making some more of the demos available.

In my Demo #2, Listening to Identi.ca, I created a VoiceXML application that does the following:

  • Asks the caller if they want to hear:
    • the latest Identi.ca message from the people they follow
    • the latest reply to them
    • the latest public Identi.ca message
  • Uses speech recognition to interpret the result
  • Retrieves the requested information from Identi.ca
  • Relays the information to the caller

Now there is the caveat that this demo is hardcoded to a single identi.ca user (namely me – identi.ca/danyork). You can try it out yourself by calling any of these numbers:

If you would like to try out this code below yourself with your own Identi.ca account, all you need to do is create a free developer account on our Evolution developer portal, create the VoiceXML file and assign it a phone number. (Step-by-step instructions are available.) You also can download a free copy of our Prophecy software, install it on a local system and set up this code there.


With that, let’s jump into the code. The full VoiceXML source code is available down below as something you can copy and paste, but right now I’m going to walk through the pieces of the code.

First we have the standard start of a VoiceXML file and the definition of a variable that is going to be used to store the results of the request to Identi.ca:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
  <var name="MyData"/>

Next we start the <form> element and begin with defining a JavaScript function that is going to retrieve the text we want from the XML data sent back to us from Identi.ca:

  <form id="F1">
    <script>
      <![CDATA[
          function GetData(d,t,n)  {       
            return (d.getElementsByTagName(t).item(n).firstChild.data);
          } 
      ]]>
    </script>

I can’t claim credit for the JavaScript – it was something I found in one of the VoiceXML tutorials we have available. Basically it is searching the received XML for tags of type “t” and then going to the “n”th tag and retrieving the data from there.

Now we start with a field in the VXML form. Note that I use an audio file that I had previously recorded:

    <field name="Choice">
        <prompt bargein="false">
           <audio src="../audio/identicachoice.wav"/>
        </prompt>

I could have just as easily used Text-To-Speech (TTS) to do the same thing:

    <field name="Choice">
        <prompt bargein="false">
           Welcome to the listen to identi.ca demo. To hear your 
           latest message please say "friends". To hear the latest
           reply please say "replies". To hear the latest public 
           message please say "public".
        </prompt>

Now I define the “grammar” which is the list of the words that I will accept and that the speech recognition engine will listen for:

        <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="MYRULE">
         <rule id="MYRULE">
          <one-of>
            <item>friends</item>
            <item>replies</item>
            <item>public</item>
          </one-of>
         </rule>
        </grammar>

To finish off the field, I’m going to catch two error cases where either the caller said nothing or did not say one of the three words in the grammar:

        <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>
      <nomatch>
        I did not recognize that word.  Please try again.
        <reprompt/>
      </nomatch>
    </field>

With the <field> defined, I move on to define what happens once acceptable input has been received by using the <filled> element. Note that I am using the Choice name that was defined in the field above.

First I am going to check if the caller said “friends” and if so I am going to use the <data> element to make a web call out to the Identi.ca site. The results of the web call are stored in the MyData variable which is then referenced in the <prompt> element:

    <filled namelist="Choice">
     <if cond="Choice == 'friends'">
      <data name="MyData" src="http://identi.ca/danyork/all/rss?limit=1"/>
      <prompt>
        Your last notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>

Note that I am using the previously defined GetData JavaScript function to walk the XML tree twice: first to get the person sending the Identi.ca notice and second to get the contents of the notice. Now to make this work, I did have to look at the XML sent back by Identi.ca and figure out which were the appropriate tags and position numbers to use.

I next do the same thing for ‘replies’ and ‘public’:

     <elseif cond="Choice == 'replies'"/>
      <data name="MyData" src="http://identi.ca/danyork/replies/rss?limit=1"/>
      <prompt>
        Your last reply is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The reply is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>
     <elseif cond="Choice == 'public'"/>
      <data name="MyData" src="http://identi.ca/rss?limit=1"/>
      <prompt>
        The last public notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',1)"/>.
      </prompt>
     </if>

You will note that I explicitly tested for ‘public’ although I really didn’t need to do so. The grammar only allowed three options, so if it was not one of the first two it would naturally be ‘public’. I could have just used an <else/> here.

Finally I just thank the caller with a final prompt and end the various elements to close off the file:

      <prompt>
        That is all. Thank you for calling in.
      </prompt>
    </filled>
  </form>
</vxml> 

FULL SOURCE CODE

For those who want to see the entire source code or copy/paste the code, here it is. You’ll note that I have here the TTS version since you won’t have access to the audio file I made. If you’d like to use a recorded prompt, you can use the code I had above.

Obviously wherever you see “danyork“, you can substitute your Identi.ca user name or that of whomever you want to hear the messages from.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
  <var name="MyData"/>
  <form id="F1">
    <script>
      <![CDATA[
          function GetData(d,t,n)  {       
            return (d.getElementsByTagName(t).item(n).firstChild.data);
} 
      ]]>
    </script>
    <field name="Choice">
        <prompt bargein="false">
           Welcome to the listen to identi.ca demo. To hear your 
           latest message please say "friends". To hear the latest
           reply please say "replies". To hear the latest public 
           message please say "public".
        </prompt>
        <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="MYRULE">
         <rule id="MYRULE">
          <one-of>
            <item>friends</item>
            <item>replies</item>
            <item>public</item>
          </one-of>
         </rule>
        </grammar>
        <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>
      <nomatch>
        I did not recognize that word.  Please try again.
        <reprompt/>
      </nomatch>
    </field>
    <filled namelist="Choice">
     <if cond="Choice == 'friends'">
      <data name="MyData" src="http://identi.ca/danyork/all/rss?limit=1"/>
      <prompt>
        Your last notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>
     <elseif cond="Choice == 'replies'"/>
      <data name="MyData" src="http://identi.ca/danyork/replies/rss?limit=1"/>
      <prompt>
        Your last reply is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The reply is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>
     <elseif cond="Choice == 'public'"/>
      <data name="MyData" src="http://identi.ca/rss?limit=1"/>
      <prompt>
        The last public notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',1)"/>.
      </prompt>
     </if>
      <prompt>
        That is all. Thank you for calling in.
      </prompt>
    </filled>
  </form>
</vxml> 

I hope you found this tutorial useful and please feel free to leave your comments, suggestions or questions here. (Including if you think of a better way for me to write my VXML code.)

Meanwhile, as I said before, if you would like to try out this code below yourself with your own Identi.ca account, all you need to do is create a free developer account on our Evolution developer portal, create the VoiceXML file and assign it a phone number. (Step-by-step instructions are available.)

Also, if you extend this app and do something interesting with it (for instance, allowing the caller to choose between different Identi.ca accounts) and would be open to sharing what you’ve done, please feel free to email me. I’d love to post some follow-up posts that show what else you can do with VoiceXML and services like Identi.ca.

P.S. Because Identi.ca uses the same style of RESTful API as Twitter, this script can be modified to work with Twitter by simply changing the web call in the <data> element to be for the Twitter API. If I recall correctly, I also had to figure out what tag name and item number were necessary for the GetData function as the XML data returned was different between Identi.ca and Twitter.


If you found this post interesting or helpful, please consider either subscribing via RSS, following us on Twitter or following us on Identi.ca.


Technorati Tags: , , , , ,


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Leveraging a CCXML Wrapper for Post-call Cleanup

Wednesday, December 17th, 2008

The default CCXML wrapper that quietly powers many VoiceXML applications on Voxeo’s Prophecy platform is certainly sufficient for many simple applications. So why should I bother diving into yet another markup language to do re-invent the wheel? Because your application isn’t that simple and you’re a control freak. Or maybe I am. Who knows.

Typically, we can handle post-call cleanup in VoiceXML easily enough by using the <submit> or <data> elements to our server-side cleanup scripts.

  <catch event="connection.disconnect">
     <submit next="cleanup.php" method="POST"/>
  </catch>

While this is a valid way to handle this, VoiceXML requires that we either transfer control to another document(<submit>) or respond with valid XML(<data>). Instead, we can leverage the disconnect event and any other portion of the application that we’d like the call to end to send this data back to our CCXML handler and let it do all the work asynchronously. The bonus here is we can have unfettered control of our voice applications. Once we have this in place, it’s simple work to leverage CCXML to control call duration, pre/post-processing and more.

When using a custom CCXML wrapper to handle post-call cleanup, our VoiceXML disconnect handler will look something like this:

  <catch event="connection.disconnect">
     <exit namelist="my_var1 my_var2 my_var3"/>
  </catch>

Here we’re using the namelist attribute of <exit> to send back our values back to CCXML. We can now handle this inside our CCXML wrapper with a dialog.exit transition:

  <transition event="dialog.exit">
    <log expr="'**** DIALOG COMPLETE - SENDING POST CALL CLEANUP'"/>
    <assign name="dialog_active" expr="false"/>
      <log expr="'my_var1 = ' + event$.values.my_var1"/>
      <log expr="'my_var2 = ' + event$.values.my_var2"/>
      <log expr="'my_var3 = ' + event$.values.my_var3"/>
      <assign name="my_var" expr="event$.values.my_var1"/>
      <!-- sends a POST to our cleanup script -->
    <disconnect connectionid="conn_id"/>
      <send name="'user.call.cleanup'" target="'cleanup.php'" targettype="'basichttp'" namelist="my_var1"/>
  </transition>

The values of our VXML variables are stored in the object event$.values. We can then grab them from the event$.values object by referencing them as event$.values.variablename – in this case, event$.values.my_var1. Since our VoiceXML dialog is complete, we’re ready to end the call. We issue a disconnect to the caller’s connectionid and send our post call cleanup. CCXML’s <disconnect>, unlike VoiceXML’s, physically disconnects the call leg. So now the call leg is ended, we’re free to handle our post-processing without paying to keep that call leg up. Brilliant!

Now that we’ve shot off our post-processing request, we can handle the send.successful event and close the session out, right? Well, we could do that, but how do we know the request was received successfully? Note, this next portion is a Voxeo specific extension and is not part of the W3C spec and requires the Voxeo namespace - <ccxml version=”1.0″ xmlns:voxeo=”http://community.voxeo.com/xmlns/ccxml”>.

Instead of assuming everything went swimmingly with our post-processing, we can be sure by having our server-side send back an event to the CCXML browser if we format the body of the response like this:

  user.cleanup.successful
  my_var1=foo
  my_var2=bar

Note that we can inject not only an event here, but name/value pairs, though they are entirely optional. Now that my server side has responded, with an event, I’ll need to handle this in my CCXML:

  <transition event="user.cleanup.successful">
    <log expr="'**** POST CALL CLEANUP COMPLETED SUCCESSFULLY'"/>
    <log expr="'**** EXITING SESSION'"/>
     <exit/>
  </transition>

Last, but very definitely not least, we will want to ensure our session does not stay alive after the call leg has disconnected. Since we are doing a little post-processing, we don’t want to end the session immediately, as we’ll want to give that time to process. So, we’ll simply shoot off a delayed user event to kill the session 60 seconds after a disconnect and prevent the zombie apocalypse.

  <transition event="connection.disconnected">
     <!-- send to unconditionally end a runaway session -->
     <send name="'user.kill.unconditional'" target="session.id" delay="'60s'"/>
  </transition>

  <transition event="user.kill.unconditional">
     <log expr="'**** UNCONDITIONAL KILL - EXITING SESSION'"/>
       <exit/>
  </transition>

That’s it. Find the complete CCXML below.

<?xml version="1.0"?>
<ccxml version="1.0" xmlns:voxeo="http://community.voxeo.com/xmlns/ccxml">

      <meta name="author" content="Dustin Hayre"/>
      <meta name="maintainer" content="YOUR_EMAIL@HERE.COM"/>

<!-- how long to wait before assuming a session is a runaway and tearing it down -->
<var name="conn_id"/>
<var name="dialog_id"/>
<var name="my_var"/>

<eventprocessor>
  <transition event="connection.alerting">
    <assign name="conn_id" expr="event$.connectionid"/>
    <accept connectionid="conn_id"/>
  </transition>

  <transition event="connection.connected">
    <log expr="'**** STARTING DIALOG TO CONNECTION ID ' + conn_id"/>
     <!-- edit SRC attribute to point to VXML dialog -->
     <dialogstart src="'dialog.vxml'" connectionid="conn_id" dialogid="dialog_id"/>
  </transition>

  <transition event="dialog.exit">
    <log expr="'**** DIALOG COMPLETE - SENDING POST CALL CLEANUP'"/>
      <log expr="'event$.values.my_var1 = ' + event$.values.my_var1"/>
      <assign name="my_var" expr="event$.values.my_var1"/>
      <!-- sends a POST to our cleanup script -->
      <send name="'user.call.cleanup'" target="'cleanup.php'" targettype="'basichttp'" namelist="foo"/>
  </transition>

  <transition event="user.cleanup.successful">
    <log expr="'**** POST CALL CLEANUP COMPLETED SUCCESSFULLY'"/>
    <log expr="'**** EXITING SESSION'"/>
     <exit/>
  </transition>

  <transition event="error.*">
    <log expr="'**** ERROR - REASON: ' + event$.reason"/>
      <exit/>
  </transition>

  <transition event="connection.disconnected">
     <!-- send to unconditionally end a runaway session -->
     <send name="'user.kill.unconditional'" target="session.id" delay="'60s'"/>
  </transition>

  <transition event="user.kill.unconditional">
     <log expr="'**** UNCONDITIONAL KILL - EXITING SESSION'"/>
       <exit/>
  </transition>

  <transition event="connection.failed">
    <log expr="'**** CONNECTION FAILED - REASON: ' + event$.reason"/>
      <exit/>
  </transition>

</eventprocessor>
</ccxml>

Feel free to comment with any questions or suggestions.

-Dustin


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Demystifying VoiceXML Subdialogs

Monday, October 6th, 2008

This is a guest post from Mark Headd, a voice application developer who was one of the first 10,000 users of our platform, and was originally published on his Vox Populi blog on October 3, 2008.


Note – this post demonstrates the use of VoiceXML subdialogs. The examples below were tested on the Voxeo Prophecy platform. To download the Prophecy software and run these examples locally, go to http://www.voxeo.com/prophecy/.

What are Subdialogs?

Subdialogs are a powerful, but often misunderstood, part of the VoiceXML specification that can be used to create reusable components for telephone applications.

The official W3C VoiceXML specification defines subdialogs thusly:

A subdialog is like a function call, in that it provides a mechanism for invoking a new interaction, and returning to the original form. Variable instances, grammars, and state information are saved and are available upon returning to the calling document. Subdialogs can be used, for example, to create a confirmation sequence that may require a database query; to create a set of components that may be shared among documents in a single application; or to create a reusable library of dialogs shared among many applications.

One of the most confusing aspects of subdialogs is that they run in a completely separate execution context from the dialog that invokes them (the parent dialog). However, once developers get over this conceptual hurdle, the real power of subdialogs becomes apparent.

Fun with Subdialogs

One of the things I like most about subdialogs is their reusability. I often find myself in situations where I need to collect input from a caller in several steps that are generally the same (i.e., a series of digits), but each has a specific validation requirement.

For example, in order to process a credit card payment there are several pieces of information that need to be obtained form a caller – credit card number, CVV number, expiration date, etc. All have the same common characteristic that they are a series of digits, but all have unique verification requirements. Valid credit card numbers have specific lengths and must pass a “mod 10″ check. CVV numbers are specific lengths for different card types.

As with a typical function call in any programming language, parameters can be passed into a VoiceXML subdialog when it is invoked. These parameters often take the form of a string of text to be read out to the caller, or a number that is used to count actions. However, we also have the option of passing more complex data types into a subdialog call

We can pass JavaScript arrays and custom object into a subdialog, or we can pass some of the native object types in JavaScript. For example, every function that is declared in JavaScript is also an instance of the JavaScript Function Object. So a JavaScript function that is declared in a parent dialog can be passed into a subdialog as a parameter.

For example, consider the following simple subdialog:

<form id=”S_1″>

<!– Parameters passed into subdialog –>
<var name=”myPrompt” />
<var name=”myFunction” />

<catch event=”noinput nomatch”>
  <prompt>That was not valid input. Try again.</prompt>
  <reprompt/>
</catch>

<field name=”getInput” type=”digits”>

 <prompt><value expr=”myPrompt”/></prompt>
  <filled>
    <if cond=”myFunction(getInput)”>
     <return namelist=”getInput”/>
    <else/>

     <clear/>
     <throw event=”nomatch”/>
    </if>
  </filled>
</field>

This subdialog accepts two parameters – a prompt that is read out to the caller, and a function that is used to validate the input. The conditional logic in the <filled> block assumes that the function returns a boolean (true/false).

This same subdialog structure can be used to collect input that meets a wide range of validation criteria. To use this subdialog to collect a 5-digit zip code, we would set the parameters as follows:

// JavaScript function to determine length of a string variable
function isFive(x) {
  if (x.length == 5 ) {
    return true;
  }
return false;
}

<var name=”myPrompt” expr=”‘Please enter your five digit zip code.’”/>
<var name=”myFunction” expr=”isFive”/>

To use this subdialog to collect and validate a credit card numbers, we would set the parameters as follows:

// JavaScript function to perform a mod 10 check
function mod10Check(x) {
    // logic of mod 10 check
    return true;
}
return false;
}

<var name=”myPrompt” expr=”‘Please enter your credit card number.’”/>
<var name=”myFunction” expr=”mod10Check”/>

A sample VoiceXML document, demonstrating how this same basic subdialog can be used to collect different kinds of input can be found here.

As this discussion shows, subdialogs are a tremendously powerful tool that can enhance the reusability of code and reduce the maintenance requirements of telephone applications. So, if you find yourself in a situation where you need to collect the same type of input from a caller several times during a call flow, take a look at subdialogs.

Their power and reusability have the potential to make your life a lot easier.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.