Posts Tagged ‘Tutorials’

Never forget to make a call with scheduled conferencing

Thursday, July 10th, 2008

We all forget to make important calls from time-to-time. With this tutorial, you will be able to schedule a call ahead of time, so that Voxeo’s IVR system calls you at that time in the future, and then links you with the party you intended to call.

While you’re utilizing your free Voxeo developer account, you might as well keep the whole shabang free, right? Head on over to x10hosting.com or forwardhosting.com, and register for an account. These are two of the very few hosting companies that will allow you to run cron jobs for free. Now that you are all setup, let’s get into the design aspect, cron job first.

While cron web interfaces will certainly be different, the underlying principal is the same: cron will wait until the system time matches your job time, and then will execute an action. Most web portals to cron jobs will allow you specify minute, hour, day, month, and weekday.

Let’s assume you have to call your wife every Friday night at 5 pm to let her know that you’re coming straight home from work. We’ll set minute to “00″, the hour to “17″, the weekday to “4″, and the rest to “*”. This will make the cron job execute on Fridays at 1700. You may need to adjust the time on your cron job to account for differences between system time and your time. For example, the x10hosting box on which my cron job runs is set to US central time. For the cron job command, use the Unix command “curl” like so:

curl http://api.voxeo.net/SessionControl/CCXML10.start?tokenid=dd5a1d7f44e97f49856eb6e894c9c669d152e89a571f6201eb3b265045b7a1d2bb52ff8d9856fbbbbbbbbbba\&numdial=5551231234

This command sends an http request to api.voxeo.net for our CCXML 1.0 token. The request also contains the variable “numdial.”

( Please note that the backslash is used in cron to escape the ampersand. This request will not work properly from a browser window )

Cron job interface

Now for the XML part:

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0" xmlns:voxeo="http://community.voxeo.com/xmlns/ccxml">

<var name="state0" expr="'init'"/>
<var name="callid_out1"/>
<var name="callid_out2"/>
<var name="pin"/>
<var name="holdMusicDlg"/>

<eventprocessor statevariable="state0">

  <transition state="init" event="ccxml.loaded">
    <createcall dest="'tel:+15555555555'" connectionid="callid_out1" callerid="'1112223333'" timeout="'30s'"/>
  </transition>

  <transition state="init" event="connection.connected">
    <assign name="callid_out1" expr="event$.connectionid"/>
    <assign name="state0" expr="'enterpin'"/>
    <dialogstart src="'null://?termdigits=#&text=Press 1 and then pound if you want to dial' + session.values.numdial"&
    type="'application/x-fetchdigits'"/>
  </transition>

  <transition state="enterpin" event="dialog.exit">
    <log expr="'PIN = [' + event$.values.digits + ']‘”/>
    <if cond=”‘1′ != event$.values.digits”>
      <exit/>
    <else/>
      <assign name=”pin” expr=”event$.values.digits”/>
    </if>

    <assign name=”state0″ expr=”‘calling’”/>
    <dialogstart src=”‘holdingPattern.vxml’” type=”‘application/xml+vxml’” namelist=”pin” dialogid=”holdMusicDlg”/>
    <createcall dest=”‘tel:+1′ + session.values.numdial” connectionid=”callid_out2″ callerid=”‘5555555555′”/>

  </transition>

  <transition state=”calling” event=”connection.failed”>
    <assign name=”state0″ expr=”‘callfailed’”/>
    <dialogterminate dialogid=”holdMusicDlg”/>
  </transition>

  <transition state=”callfailed” event=”dialog.exit”>
    <assign name=”state0″ expr=”‘playingCallFailed’”/>
    <dialogstart src=”‘callFailure.vxml’” type=”‘application/xml+vxml’” connectionid=”callid_out1″/>
  </transition>

  <transition state=”playingCallFailed” event=”dialog.exit”>
    <disconnect/>
  </transition>

  <transition state=”calling” event=”connection.connected”>
    <if cond=”event$.connectionid == callid_out1″>
      <exit/>
    <else/>
      <assign name=”state0″ expr=”‘beforeBridging’”/>
      <dialogterminate dialogid=”holdMusicDlg”/>
    </if>
  </transition>

  <transition state=”beforeBridging” event=”dialog.exit”>
    <send name=”‘pause’” target=”session.id” delay=”‘200ms’”/>
  </transition>

  <transition state=”beforeBridging” event=”pause”>
    <assign name=”state0″ expr=”‘bridged’”/>
    <join id1=”callid_out1″ id2=”callid_out2″/>
  </transition>

  <transition event=”error.conference.join”>
    <log expr=”‘*** ERROR DURING JOIN ***’”/>
    <exit/>
  </transition>

  <transition event=”error.*”>
    <log expr=”‘an error has occured (’ + event$.reason + ‘)’”/>

    <voxeo:sendemail to=”‘yourEmail@there.com’”
      from=”‘myApp@here.com’”
      type=”‘debug’”
      body=” ‘generic error detected ! ‘ “/>
    <exit/>
  </transition>
</eventprocessor>
</ccxml>

This application has a fairly simple flow. It calls 1-555-555-5555, and then asks the callee to press 1 and then # to connect to whatever number was passed in via the http request (in this case numdial=5551231234).

That’s it. To make this work for you, you need to change four values:

1. token ID in http request
2. numdial variable in http request
3. “5555555555″ is found twice in the CCXML file - both instances should be changed to your number
4. sendemail “to” and “from” in CCXML file

Good luck with your development - mix in a little MySQL and PHP action to make adding more cron jobs easier.

Till next time,

Jeremy McCall
Voxeo Network Operations

Certified Tech Tip: Alpha-Numeric voice recognition grammars - part two

Tuesday, May 20th, 2008

voicexmlcertifieddeveloper.gif
In our last entry to the tech-tips blog, we detailed the challenges inherent in capturing alphabetical, or alpha-numeric entries from our callers, and detailed several paths for minimizing the chance of mis-recognition when implementing input fields based on these two categories of voice recognition. The long and short of this posting was that IVR developers should refrain from attempting this wherever possible, and to instead try these alternatives:

* Pre-compiled Statistical Language Model grammars
* Leveraging TargusInfo services for advanced recognition accuracy

However, the IVR project requirements dictate what we can, and can’t do as developers, so in some cases, we have to try and whip out a user grammar that takes alpha, or alpha-numeric input. As mentioned in our last blog entry, there are a few things we can do to stack the deck to try and squeeze more accuracy out of these grammars so that we don’t end up with frustrated callers, but the plain truth is that we will never, ever be able to write a grammar that accepts alphabetical characters to be 100% accurate using todays recognition technology. What we will do today is twofold:

(1) Craft an SRGS+SISR subgrammar for alphabetical, and numeric characters

(2) Plug this grammar into a mixed-initiative form dialog that will minimize (but not fully address!), the possibility for mis-recognitions.

Those developers who have the need for such a grammar and dialog within their production-grade applications are advised to take this basic framework as a starting point, and then expand on it by:

(a) Test carefully with a broad range of users, and to fully flesh out alternate utterance values for alphabetic characters

(b) Apply item weighting to specific characters based on the probability of a given character versus another like-sounding character - this will depend greatly on the specific usage of the grammar

(c) Track results by using w3c-compliant utterance recording, and logging all shadow variables, so that these results can be used to further tune and tweak our grammar for maximum accuracy

(d) Consider using n-best post-processing as an additional confirmation step to ensure that the results we receive are indeed accurate

For today’s entry, lets assume that we need to track a three digit zip code, which are prevalent in Canadian locales. Our predefined format for utterance values are “Alpha Digit Alpha”, and luckily, not all alpha characters are applicable: Instead of trying to recognize 26 letters accurately, we only need to recognize 16, which helps a lot!

We won’t dig into the specifics of a mixed-initiative form dialog, as we have already done so in our mixed-initiatve tutorial, but the gist is that this feature of VoiceXML allows us to fill multiple fields with a single utterance, and breaking up each alpha and numeric character into it’s own recognition field greatly cuts down on disambiguation problems that can occur.

For the purposes of brevity, what we have below is a stripped-down version of our fully fleshed-out grammar, but you may download the full grammar, and the mixed-initiative dialog right here, which contains lots more inline notations.

<?xml version= "1.0"?><grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US">

<rule id="canadianZip" scope="public">

<one-of>

<!-- ALL THREE FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item><!-- ONLY TWO FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<!-- ONLY ONE FIELD FILLED  -->

<item>

<ruleref uri="#alphaRule1"/>

<tag></tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</one-of>

</rule>

<rule id="alphaRule1" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ex</item>

<item> ax</item>

<item> x </item>

</one-of>

<tag>out.alphaSlot1="X"; </tag>

</item>

</one-of>

</rule>

<rule id="numRule" scope="public">

<one-of>

<item> one <tag>out.numSlot="1"; </tag>  </item>

</one-of>

</rule>

<rule id="alphaRule2" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ay</item>

</one-of>

<tag>out.alphaSlot2="A"; </tag>

</item>

</one-of>

</rule>

</grammar>

In brief, our top-level rule assumes that we can have any of the following entries:

"X1A""X"

"X1"

"XA"

"1"

"1A"

And in the event that we get one or two characters matched in our utterance, the VoiceXML mixed-initiative logic will then take over, and prompt the caller to fill in any “blanks” remaining.

A few things of note about the grammar defined below is that in the event that we receive only a single alpha utterance, we will assume that it is the first character, not the last. Additionally, when we construct a grammar that contains multiple slot returns, it is required that we explicitly define the slot values all the way up the chain: if we didn’t define the “out.[slotname]=rules.[rulename].[subslot]” within the context of the top-level rule, the last slot value would overwrite all others, meaning that we would only get a value for “alphaSlot2″ within the VoiceXML dialog. To illustrate even further, the below snippet for a top-level return would make this a reality:

<item>
<ruleref uri="#alphaRule1"/>

<ruleref uri="#numRule"/>

<ruleref uri="#alphaRule2"/>

</item>

You’ll also see that each possibility for character recognition is specified within the top-level rule, so in the event that we get 1, 2 or 3 character strings, we can pipe the return value back to the VoiceXML, and let the mixed-initiative dialogs then access the sub-rules (alphaRule1/2 and numRule), individually as needed.

We also illustrated in brief how one can define multiple like-sounding utterance values that return the same interpretation value, and defined an for our alphaRule1 entry simply to show how this can be done: The task of taking this framework, and turning it into a grammar that satisfies any given project rests in the hands of you, the capable IVR developer.

=^)

Till next time,

Matthew Henry
Director of Customer Support
Voxeo Corporation

Useful Links

Technorati Tags:
, , , , , , ,

Accessing Web Services From VoiceXML

Thursday, May 8th, 2008

This is a guest post from Mark Headd, a voice application developer who was one of the first 10,000 users of our platform, and was originally published on his Vox Populi blog on May 6, 2008.


A few weeks ago, I posted about accessing web services from CCXML using PHP. This post will demonstrate how to do the same thing, only from VoiceXML. We’ll be using Voxeo Prophecy and PHP for this example. We’ll also be referring to the GreenPhone project — available free for download — for the sample code.

Before we dive in, its important to keep in mind that there are a number of different techniques for getting information from web services into a VoiceXML dialog. This is just one method — there are many others. Voxeo even has its own platform-specific way of accessing SOAP web services via JavaScript. Ultimately, the method you employ needs to be a good fit for the environment your working in and the requirements of your project.

Using the greenSoapClient Class

In the last post on this topic, I demonstrated how to use a simple PHP class as a way to access multiple SOAP-based web services from CCXML. This class forms the basis of our method for accessing web services from VoiceXML as well. However, in this instance, instead of using the CCXML <send/> element, we’ll use a VoiceXML subdialog.

Subdialogs in VoiceXML are typically used to create reusable dialog components for capturing common types of input, like a series of digits (e.g., credit card numbers, account numbers, etc). They can also be used to compartmentalize complex interactions with a caller and provide a simple interface for accessing results. By way of example, this is how the OSDMs from Nuance work, as well as the Targus service from Voxeo. We’ll borrow this approach to access a web service from StrikeIron that will send the details of an E85 or bio-diesel station to a cell phone via SMS.

Setting up our Subdialog

In order to send an SMS message with details on an E85 or bio-diesel station, we’ll need 2 things; the station details, and a cell phone number to send it to.

In order to send the details on a station from VoiceXML to PHP, we’ll pack it up in a pipe-delimited string called “detailsToSend” (I won’t go into too much detail about how this is done in this post — to learn more, refer to the GreenPhone Project code). The cell phone number we are sending to is obtained from the caller ID of the calling party, stored in a variable named “ani”. Details on how to access caller ID are given in a previous post.

Our subdialog call will look like this:

<form id="sendDetails">
<catch event="error.badfetch">
<prompt>
There was a problem sending the station details to your phone.
<break strength="weak"/>
</prompt>
<goto next="#goodbye"/>
</catch>

<subdialog name="sendSMS" src="../php/sendStationDetails.php" namelist="ani detailsToSend">
<prompt>
Sending the station details to
<say-as interpret-as="telephone"><value expr="ani"/></say-as>
</prompt>
<filled>
<if cond="sendSMS.result==0">
<prompt>Your message has been sent.<break strength="weak"/></prompt>
<else/>
<prompt>
There was a problem sending the station details to your phone.
<break strength="weak"/>
</prompt>
</if>
<goto next="#goodbye"/>
</filled>
</subdialog>
</form>

We use the attributes on the <subdialog> element to give our subdialog a name (which we’ll use to access the results sent back from PHP), to specify where to POST our variables to and also to specify which variables to POST.

You’ll also notice that we have set up a handler here for an “error.badfetch” event. This is a good habit to get into whenever you set up a request to an external resource (like a PHP script). If the script isn’t there or has problems, an “error.badfetch” event will get returned and unless you specified a handler for this event, your day will not end well.

Additionally, we’ve set up logic in our filled block to inspect the result of the subdialog call. We access the result as a property of the subdialog, using the name we set up in the <subdialog> element and the dot notation (”.”) familiar to JavaScript.

<if cond=”sendSMS.result==0″>

… code logic goes here …

</if>

With this in mind, our PHP script needs to send back a variable called “result”. How do we do this? Lets take a look at the PHP script:

A Simple Subdialog using PHP

The subdialog that we want to render is extremely simple — we only need to render enough VoiceXML to declare a variable called “result” and return it to the parent dialog. We’ll do this after we make our web service call to send the SMS message.

There are two pieces of information returned from the StrikeIron web service that we are interested in; a string that holds the response message from the service (i.e., “success”, “failure”, etc.) and a number indicating the outcome of the web service call.

We’ll take these two bits if information and assign them to PHP variables:

$result = $xml->soapHeader->ResponseInfo->ResponseCode;
$message = $xml->soapHeader->ResponseInfo->Response;

Now, we want to write out these variables in a simple VoiceXML subdialog:

<?xml version="1.0" encoding="utf-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml">
<form id="F_1">
<log>*** SMS response message was: <?php echo $message; ?>. ***</log>
<block>
<var name="result" expr="<?php echo $result ?>"/>
<return namelist="result"/>
</block>
</form>
</vxml>

As discussed above, this creates just enough VoiceXML to instantiate a variable and return it to the parent dialog. For good measure, we’ll write out the web service string (contained in the PHP variable $message) as a log statement, in case it contains information we want to look at later.

Why This Approach?

Using this technique for accessing web services from VoiceXML provides a couple of advantages. First, it allows us to completely separate the presentation layer (the VoiceXML) from the logic used to invoke the web service. This is a fairly standard design practice that makes creating the dialog much easier for a developer that does not necessarily know a whole lot about web services. With this approach, they don’t really need to — they only need to know that the subdialog call will return a variable called “result” whose value can be inspected to determine what to do next.

Additionally, because the parent dialog is just static VoiceXML it may be possible to cache it. Since the parent dialog isn’t dynamic, it can be cached for fast access, while the subdialog — which must be dynamic — is the only component sent from the web server to the VoiceXML platform each time a caller accesses the application. Careful design can yield additional caching opportunities that can make your applications more efficient and less bandwidth intensive.

In the next post, we’ll explore one additional method for accessing web service from VoiceXML. Stay tuned…

Technorati Tags:
, , , , , ,

A new series of guest posts/tutorials coming to this blog…

Sunday, April 27th, 2008

….___ VOX POPULI ___…. » Earth Day Special Project_ GreenPhone.jpgYou’ll soon see a new series of tutorials coming to this weblog as guest posts from Mark Headd. Mark is a voice application developer and member of our developer community who is very proud of the fact that he’s been using our platform long enough to have a developer ID under 10,000! He writes over on his own weblog, Vox Populi, (more info on Mark on his About page) and recently kicked off a series of posts around how to use CCXML and VoiceXML with PHP and other languages. I asked Mark if he would grant permission for us to cross-post his articles here with attribution and he gladly granted that. Mark has a great writing style and some great tutorials and so we are thrilled to be bringing his writing to you. Stay tuned…

Technorati Tags:
,

Want to learn about CCXML? A slide/audio tutorial is now online…

Thursday, April 3rd, 2008

Want to learn more about CCXML? Our online tutorial and documentation is one great way to learn but here in this blog we’re also going to be bringing you other tutorials and information as we either find them or put them online ourselves.

Today I thought I’d make you aware of an “Introduction to CCXML” video posted by Moshe Yudkowsky (who I recently interviewed). It’s actually a tutorial session he did at the SpeechTek conference in August 2007 and in this video he has synced the audio to his slides. This is part 1 and runs about 85 minutes.

Video thumbnail. Click to play
Click to play

Part 2 is also now available and runs about 78 minutes. We appreciate Moshe making great information like this available to the public.

If you would like to experiment with CCXML yourself or try out any of the examples in these slides, you can go to www.voxeo.com/free and either signup to our Evolution hosted platform or download our Prophecy software for your local computer.

And stay tuned for more videos coming your way in this blog soon…

Technorati Tags:
, , , ,