Posts Tagged ‘VoiceXML’

Certified Tech Tip: Alpha-Numeric voice recognition grammars - part two

Tuesday, May 20th, 2008

voicexmlcertifieddeveloper.gif
In our last entry to the tech-tips blog, we detailed the challenges inherent in capturing alphabetical, or alpha-numeric entries from our callers, and detailed several paths for minimizing the chance of mis-recognition when implementing input fields based on these two categories of voice recognition. The long and short of this posting was that IVR developers should refrain from attempting this wherever possible, and to instead try these alternatives:

* Pre-compiled Statistical Language Model grammars
* Leveraging TargusInfo services for advanced recognition accuracy

However, the IVR project requirements dictate what we can, and can’t do as developers, so in some cases, we have to try and whip out a user grammar that takes alpha, or alpha-numeric input. As mentioned in our last blog entry, there are a few things we can do to stack the deck to try and squeeze more accuracy out of these grammars so that we don’t end up with frustrated callers, but the plain truth is that we will never, ever be able to write a grammar that accepts alphabetical characters to be 100% accurate using todays recognition technology. What we will do today is twofold:

(1) Craft an SRGS+SISR subgrammar for alphabetical, and numeric characters

(2) Plug this grammar into a mixed-initiative form dialog that will minimize (but not fully address!), the possibility for mis-recognitions.

Those developers who have the need for such a grammar and dialog within their production-grade applications are advised to take this basic framework as a starting point, and then expand on it by:

(a) Test carefully with a broad range of users, and to fully flesh out alternate utterance values for alphabetic characters

(b) Apply item weighting to specific characters based on the probability of a given character versus another like-sounding character - this will depend greatly on the specific usage of the grammar

(c) Track results by using w3c-compliant utterance recording, and logging all shadow variables, so that these results can be used to further tune and tweak our grammar for maximum accuracy

(d) Consider using n-best post-processing as an additional confirmation step to ensure that the results we receive are indeed accurate

For today’s entry, lets assume that we need to track a three digit zip code, which are prevalent in Canadian locales. Our predefined format for utterance values are “Alpha Digit Alpha”, and luckily, not all alpha characters are applicable: Instead of trying to recognize 26 letters accurately, we only need to recognize 16, which helps a lot!

We won’t dig into the specifics of a mixed-initiative form dialog, as we have already done so in our mixed-initiatve tutorial, but the gist is that this feature of VoiceXML allows us to fill multiple fields with a single utterance, and breaking up each alpha and numeric character into it’s own recognition field greatly cuts down on disambiguation problems that can occur.

For the purposes of brevity, what we have below is a stripped-down version of our fully fleshed-out grammar, but you may download the full grammar, and the mixed-initiative dialog right here, which contains lots more inline notations.

<?xml version= "1.0"?><grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US">

<rule id="canadianZip" scope="public">

<one-of>

<!-- ALL THREE FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item><!-- ONLY TWO FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<!-- ONLY ONE FIELD FILLED  -->

<item>

<ruleref uri="#alphaRule1"/>

<tag></tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</one-of>

</rule>

<rule id="alphaRule1" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ex</item>

<item> ax</item>

<item> x </item>

</one-of>

<tag>out.alphaSlot1="X"; </tag>

</item>

</one-of>

</rule>

<rule id="numRule" scope="public">

<one-of>

<item> one <tag>out.numSlot="1"; </tag>  </item>

</one-of>

</rule>

<rule id="alphaRule2" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ay</item>

</one-of>

<tag>out.alphaSlot2="A"; </tag>

</item>

</one-of>

</rule>

</grammar>

In brief, our top-level rule assumes that we can have any of the following entries:

"X1A""X"

"X1"

"XA"

"1"

"1A"

And in the event that we get one or two characters matched in our utterance, the VoiceXML mixed-initiative logic will then take over, and prompt the caller to fill in any “blanks” remaining.

A few things of note about the grammar defined below is that in the event that we receive only a single alpha utterance, we will assume that it is the first character, not the last. Additionally, when we construct a grammar that contains multiple slot returns, it is required that we explicitly define the slot values all the way up the chain: if we didn’t define the “out.[slotname]=rules.[rulename].[subslot]” within the context of the top-level rule, the last slot value would overwrite all others, meaning that we would only get a value for “alphaSlot2″ within the VoiceXML dialog. To illustrate even further, the below snippet for a top-level return would make this a reality:

<item>
<ruleref uri="#alphaRule1"/>

<ruleref uri="#numRule"/>

<ruleref uri="#alphaRule2"/>

</item>

You’ll also see that each possibility for character recognition is specified within the top-level rule, so in the event that we get 1, 2 or 3 character strings, we can pipe the return value back to the VoiceXML, and let the mixed-initiative dialogs then access the sub-rules (alphaRule1/2 and numRule), individually as needed.

We also illustrated in brief how one can define multiple like-sounding utterance values that return the same interpretation value, and defined an for our alphaRule1 entry simply to show how this can be done: The task of taking this framework, and turning it into a grammar that satisfies any given project rests in the hands of you, the capable IVR developer.

=^)

Till next time,

Matthew Henry
Director of Customer Support
Voxeo Corporation

Useful Links

Technorati Tags:
, , , , , , ,

Accessing Web Services From VoiceXML

Thursday, May 8th, 2008

This is a guest post from Mark Headd, a voice application developer who was one of the first 10,000 users of our platform, and was originally published on his Vox Populi blog on May 6, 2008.


A few weeks ago, I posted about accessing web services from CCXML using PHP. This post will demonstrate how to do the same thing, only from VoiceXML. We’ll be using Voxeo Prophecy and PHP for this example. We’ll also be referring to the GreenPhone project — available free for download — for the sample code.

Before we dive in, its important to keep in mind that there are a number of different techniques for getting information from web services into a VoiceXML dialog. This is just one method — there are many others. Voxeo even has its own platform-specific way of accessing SOAP web services via JavaScript. Ultimately, the method you employ needs to be a good fit for the environment your working in and the requirements of your project.

Using the greenSoapClient Class

In the last post on this topic, I demonstrated how to use a simple PHP class as a way to access multiple SOAP-based web services from CCXML. This class forms the basis of our method for accessing web services from VoiceXML as well. However, in this instance, instead of using the CCXML <send/> element, we’ll use a VoiceXML subdialog.

Subdialogs in VoiceXML are typically used to create reusable dialog components for capturing common types of input, like a series of digits (e.g., credit card numbers, account numbers, etc). They can also be used to compartmentalize complex interactions with a caller and provide a simple interface for accessing results. By way of example, this is how the OSDMs from Nuance work, as well as the Targus service from Voxeo. We’ll borrow this approach to access a web service from StrikeIron that will send the details of an E85 or bio-diesel station to a cell phone via SMS.

Setting up our Subdialog

In order to send an SMS message with details on an E85 or bio-diesel station, we’ll need 2 things; the station details, and a cell phone number to send it to.

In order to send the details on a station from VoiceXML to PHP, we’ll pack it up in a pipe-delimited string called “detailsToSend” (I won’t go into too much detail about how this is done in this post — to learn more, refer to the GreenPhone Project code). The cell phone number we are sending to is obtained from the caller ID of the calling party, stored in a variable named “ani”. Details on how to access caller ID are given in a previous post.

Our subdialog call will look like this:

<form id="sendDetails">
<catch event="error.badfetch">
<prompt>
There was a problem sending the station details to your phone.
<break strength="weak"/>
</prompt>
<goto next="#goodbye"/>
</catch>

<subdialog name="sendSMS" src="../php/sendStationDetails.php" namelist="ani detailsToSend">
<prompt>
Sending the station details to
<say-as interpret-as="telephone"><value expr="ani"/></say-as>
</prompt>
<filled>
<if cond="sendSMS.result==0">
<prompt>Your message has been sent.<break strength="weak"/></prompt>
<else/>
<prompt>
There was a problem sending the station details to your phone.
<break strength="weak"/>
</prompt>
</if>
<goto next="#goodbye"/>
</filled>
</subdialog>
</form>

We use the attributes on the <subdialog> element to give our subdialog a name (which we’ll use to access the results sent back from PHP), to specify where to POST our variables to and also to specify which variables to POST.

You’ll also notice that we have set up a handler here for an “error.badfetch” event. This is a good habit to get into whenever you set up a request to an external resource (like a PHP script). If the script isn’t there or has problems, an “error.badfetch” event will get returned and unless you specified a handler for this event, your day will not end well.

Additionally, we’ve set up logic in our filled block to inspect the result of the subdialog call. We access the result as a property of the subdialog, using the name we set up in the <subdialog> element and the dot notation (”.”) familiar to JavaScript.

<if cond=”sendSMS.result==0″>

… code logic goes here …

</if>

With this in mind, our PHP script needs to send back a variable called “result”. How do we do this? Lets take a look at the PHP script:

A Simple Subdialog using PHP

The subdialog that we want to render is extremely simple — we only need to render enough VoiceXML to declare a variable called “result” and return it to the parent dialog. We’ll do this after we make our web service call to send the SMS message.

There are two pieces of information returned from the StrikeIron web service that we are interested in; a string that holds the response message from the service (i.e., “success”, “failure”, etc.) and a number indicating the outcome of the web service call.

We’ll take these two bits if information and assign them to PHP variables:

$result = $xml->soapHeader->ResponseInfo->ResponseCode;
$message = $xml->soapHeader->ResponseInfo->Response;

Now, we want to write out these variables in a simple VoiceXML subdialog:

<?xml version="1.0" encoding="utf-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml">
<form id="F_1">
<log>*** SMS response message was: <?php echo $message; ?>. ***</log>
<block>
<var name="result" expr="<?php echo $result ?>"/>
<return namelist="result"/>
</block>
</form>
</vxml>

As discussed above, this creates just enough VoiceXML to instantiate a variable and return it to the parent dialog. For good measure, we’ll write out the web service string (contained in the PHP variable $message) as a log statement, in case it contains information we want to look at later.

Why This Approach?

Using this technique for accessing web services from VoiceXML provides a couple of advantages. First, it allows us to completely separate the presentation layer (the VoiceXML) from the logic used to invoke the web service. This is a fairly standard design practice that makes creating the dialog much easier for a developer that does not necessarily know a whole lot about web services. With this approach, they don’t really need to — they only need to know that the subdialog call will return a variable called “result” whose value can be inspected to determine what to do next.

Additionally, because the parent dialog is just static VoiceXML it may be possible to cache it. Since the parent dialog isn’t dynamic, it can be cached for fast access, while the subdialog — which must be dynamic — is the only component sent from the web server to the VoiceXML platform each time a caller accesses the application. Careful design can yield additional caching opportunities that can make your applications more efficient and less bandwidth intensive.

In the next post, we’ll explore one additional method for accessing web service from VoiceXML. Stay tuned…

Technorati Tags:
, , , , , ,

JavaScript Trick for Voice Applications

Monday, April 28th, 2008

This is a guest post from Mark Headd, a voice application developer who was one of the first 10,000 users of our platform, and was originally published on his Vox Populi blog on April 25, 2008.


There are times when it is desirable to change the behavior of a VoiceXML application based on a specific setting.For example, the GreenPhone application that I have mentioned in several previous posts has a setting that can be used to control whether special audio files are played. I personally find these audio files funny and somewhat endearing — others may not. To control whether they are played, there is a variable in the application root document called (cleverly) playAudio.

<var name="playAudio" expr="true"/>

It’s default setting is true, and this can be changed to false to prevent these files from playing. The typical method for checking a variable like this one to determine if an audio file should be played looks something like this:

<if cond="playAudio">
  <audio src="myFile.wav"/>
</if>

There isn’t anything wrong with this, and since there isn’t a “cond” attribute on the <audio/> tag there aren’t very many good alternatives. There is one alternative method that I rather fancy that uses the JavaScript conditional operator to distill this to a single line of code:

<audio expr="playAudio ? 'myRealAudioFile.wav' : 'myFakeAudioFile.wav'"/>

This shortcut allows us to assign a value to the audio file reference via the “expr” attribute, instead of using an explicit URI to the location of an audio file. The way the operator behaves is to first evaluate the condition on the far left side — if it evaluates to true then the first expression is assigned as the URI of the audio file. If it evaluates to false, then the second expression is used.

The trick here is that the second expression resolves to a bogus audio file — it doesn’t exist. This will not cause a fatal error in your application, it will simply cause Prophecy not to play an audio file (it can’t because the file doesn’t exist).

The JavaScript conditional operator can come in very handy in CCXML as well. For example, there are times in CCXML where I want to use <dialogterminate/> to end a call, but I may not be certain which dialog a caller is in — the JavaScript conditional operator can come in handy here:

<dialogterminate dialogid="loggedIn ? voiceMailDialog : loginDialog"/>

Since the “dialogid” attribute is an expression, we can use the JavaScript conditional operator to check and see if a caller has logged into a voice mail system to retrieve their voicemail. If there loggedIn status is true, we assume that they are in the voiceMailDialog and yank them from that. Otherwise, we assume they are in the first dialog and yank from there.

There are surely other ways to do these things, but in my humble opinion the JavaScript conditional operator deserves some attention as a powerful shortcut for doing things in CCXML or VoiceXML using the Voxeo Prophecy platform.

Technorati Tags:
, , , , , , ,

Accessing Web Services From CCXML

Monday, April 28th, 2008

This is a guest post from Mark Headd, a voice application developer who was one of the first 10,000 users of our platform, and was originally published on his Vox Populi blog on April 21, 2008.


This is the first in a series of posts that will highlight how to accomplish specific things using the Voxeo Prophecy platform. All of the examples that will be discussed draw directly from the GreenPhone project discussed in a previous post.

The first issue that will be discussed – accessing web services from CCXML using PHP.

One of the very cool things about Prophecy is that it comes bundled with the PHP scripting language. In fact, I have on occasion referred to PHP as Prophecy’s “embedded scripting language.” PHP 5 comes with an abundance of features that will be of interest to IVR developers – chief among them, the ability to create SOAP clients to interact with web services, and the ability to easily work with data in XML format using the SimpleXML extension.

If you’ve read my previous post on the GreenPhone Project, you will know that I am using a collection of web services from StrikeIron that includes a web service to provide information on U.S. area codes. If we pass this web service an area code, it will return the U.S. state that area code is in. Ultimately, what the GreenPhone application will do is look up E85 and bio-diesel stations by state. So when a person calls the application, we want to use their area code to look up what state they are in – thereby saving them the trouble of entering this information manually.

In CCXML, we can access the caller’s ANI via the Connection Object:

<transition state="initial" event="connection.alerting">
 <log expr="'*** Call is coming in.  Lookup area code information. ***'"/>
 <assign name="ani" expr="event$.connection.remote"/>
 <assign name="areacode" expr="ani.substring(0,3)"/>
 <send target="'php/areaCodeLookup.php'" name="'lookupEvent'" targettype="'basichttp'" namelist="areacode"/>
</transition>

This block of code show how to set up a transition to access the ANI on an incoming call. When an incoming call is detected by Prophecy, the “connection.alerting” event is delivered and we have access to the Connection Object’s “remote” property – this property exposes the telephone URL for the device that is calling into the platform. Note – in my previous post, I explained the process of setting up the Prophecy SIP phone to deliver a specific ANI. This is how we access the value that is set in the Prophcy SIP phone.

We assign the ANI value to a variable we have previously declared and (very cleverly) called “ani”, and then we grab the first 3 characters of this string (using the ECMAScript substring method) and assign them to another variable called “areacode”. We then pass the area code value to a PHP script that will interact with the StrikeIron area code web service.

Using the CCXML <send/> element in this fashion is identical to an HTTP GET with the areacode variable appended to the URL of the PHP script, like this:

http://myserver/php/areaCodeLookup.php?areacode=123

There several possible outcomes of this HTTP request:

  1. Our PHP script was able to successfully interact with the StrikeIron web service and lookup the U.S. state information for the submitted area code;
  2. Our PHP script was able to successfully interact with the StrikeIron web service but was not able to lookup the state information for the submitted area code (bad area code);
  3. Something went wrong (an exception occurred) while trying to interact with the web service; or,
  4. Something really went wrong and our HTTP request resulted in a bad response from the server.

We need to set up handlers for each possible outcome – we won’t discuss them in detail until after we look more closely at the PHP components that are interacting with the StrikeIron web service, but to summarize what we’ll need, here they are:

<transition state="lookup" event="areaCodeLookupSuccess">
</transition>
<transition state="lookup" event="areaCodeLookupFailure">
</transition>
<transition state="lookup" event="error.send.failed">
</transition>

The first two handlers react to custom events that we will toss into the CCXML event stream (more on that shortly), and the last will take care of instances where we get an invalid response back from the server (e.g., a 404 response). Now lets look at the PHP components that interact with the StrikeIron web services.

When the HTTP request from Prophecy that holds our area code information is received in PHP, we can access the submitted value by using the PHP $_REQUEST superglobal:

$areacode = (int) $_REQUEST['areacode'];

You’ll notice that we also typecast the value as a way of cleansing the input – as with any other kind of web application, never trust user input. Even though we’re not using the submitted information in a SQL query, this is a really good habit to get into. There are certainly other ways to achieve this, but type casting is simple and effective for our purposes.

The PHP version that comes bundled with Prophecy has support for PHP’s SOAP extension right out of the box. Since we’re going to be accessing several different web services over the course of one telephone call, I decided to set up a very simple class to handle all of the interactions with the StrikeIron web services.

class greenSoapClient {
  private $client;
  private $headers;
  function __construct($type) {
    global $WSDL, $USER, $PSWD;
    $this->client = new SoapClient($WSDL[$type], array(’trace’ => 1,
                                          ‘exceptions’ => 0));
    $headerArray = array(”RegisteredUser” => array(”UserID” => $USER,
                                          “Password” => $PSWD));
    $this->headers = new SoapHeader(”http://ws.strikeiron.com”,
                                          “LicenseInfo”, $headerArray);
  }
  function makeSoapCall($name, $params) {
    $result = $this->client->__call($name, array($params), NULL, $this->headers);
    return $this->client->__getLastResponse();
  }
  function __destruct() {
    unset($this->client);
  }
}

This class has only three functions – a constructor, a destructor and a function to make the call to the SOAP method we want information from.

When we instantiate the greenSoapClient class, we pass in a reference to a WSDL file for the service we want to invoke. In this case, we will pass in a reference to the WSDL file for the U.S. Area Code Information Web Service. (Actually, the string “areaCode” is used to access the WSDL reference from a pre-established associative array holding the URL references for all of the WSDL files used by the greenPhone application.)

$mySoapClient = new greenSoapClient("areacode");

Now that we have our area code information, and a shiny new greenSoapClient object to work with, we can make our SOAP call:

$param = array('AreaCode' => $areacode);
$response = $mySoapClient->makeSoapCall('GetAreaCode', $param);

The variable $response now holds the XML response that was returned from the web service. We’ll need to process this response in order to properly format the information we want to return to CCXML.

One of the very cool things about the Voxeo implementation of CCXML is that developers can toss custom events into the CCXML event stream using simple HTTP responses. Prophecy lets us send back a custom event, as well as any data that we want to access in CCXML as properties of that event. We do this by formatting our response as follows:

First line of body of HTTP response = custom event name.
Data to be returned to CCXML = name value pair appearing on successive lines of the HTTP body, one pair per line.

The U.S. Area Code Information Web Service returns two pieces of information that we want to access in CCXML – a count of the number of locations identified for each area code (typically 1), and the name of the U.S. state that area code belongs to. A snippet of the raw response returned from the web service might look something like this (for the 610 area code):

<ServiceResult>
 <Count>1</Count>
 <AreaCodes>
  <AreaCodeInfo>
   <AreaCode>610</AreaCode>
   <Location>Pennsylvania</Location>
  </AreaCodeInfo>
 </AreaCodes>
</ServiceResult>

We want to format our raw XML response like so:

areaCodeLookupSuccess
count=1
location=Pennsylvania

The easiest way to do this in PHP is to use the SimpleXML extension:

$xml = new SimpleXmlElement($response);
$result = $xml->soapBody->GetAreaCodeResponse->GetAreaCodeResult;
$output = "areaCodeLookupSuccess\n";
$output .= "count=".$result->ServiceResult->Count."\n";
$output .= "location=".$result->ServiceResult->AreaCodes->AreaCodeInfo->Location."\n\n";

We take the response from the StrikeIron web service and use it to create a new SimpleXML object. We can then access the values we want and build our HTTP response.

How do we deliver our response once we’re done constructing it, we simply use the PHP “echo” language construct to write it out:

echo $output;

Now that we’ve returned our values to CCXML, how do we access them? For the answer to that,we need to go back to the handlers we set up previously, most importantly the handler for the custom “areaCodeLookupSuccess” event:

<transition state="lookup" event="areaCodeLookupSuccess">
 <assign name="count" expr="event$.count"/>
 <if cond="count == 1">
  <assign name="location" expr="event$.location"/>
  <assign name="stateCode" expr="getStateCode(event$.location)"/>
  <assign name="myState" expr="'accepting'"/>
  <accept connectionid="connection_id"/>
 <else/>
  <log expr="'*** Could not look up area code. ***'"/>
  <reject/>
 </if>
</transition>

When we write out our web service response in PHP, we can cause a custom event to drop into the CCXML event stream – the name of this event is the first line of the HTTP response we just constructed – areaCodeLookupSuccess.

We access the values we just returned to CCXML as properties of the areaCodeLookupSuccess event using the “event$.” vernacular. This allows us to assign these values to ECMAScript variables that we have previously declared. It also lets us decide how we want our application to react, based on certain conditions (e.g., if count = 0).

Similarly, our other event handlers can be used if we get an unexpected response form the web service – we could send back a “areaCodeLookupFailure” event. If something really bad happens – like an invalid response from the web server we will get an “error.send.failed” event, so we’ll want to have a handler ready for that as well.

Now that you have a flavor for how to access web services using CCXML and PHP, we’ll look at two different techniques for returning information from a web service to VoiceXML. We’ll cover these two techniques in the next two posts. Stay tuned…

Technorati Tags:
, , , , , ,

Earth Day Special Project: Project Green Phone

Sunday, April 27th, 2008

This is a guest post from Mark Headd, a voice application developer who was one of the first 10,000 users of our platform, and was originally published on his Vox Populi blog on April 17, 2008.


Earth Day 2008 is fast approaching, so I wanted to try and build something that would help the environment and also be a cool demonstration of telephone applications generally, and the Voxeo Prophecy platform in particular.

I decided to whip up a simple application that would allow a caller to search for E85 and Bio-diesel fuel stations in their state. Some of the specific goals that I had in mind when I got started were:

  • To make use of the Voxeo Prophecy platform, the premiere VoiceXML/CCXML platform for building voice applications (at least in my opinion).
  • To code the application entirely in VoiceXML, CCXML, ECMAScript and PHP (that’s right, no database!).
  • To integrate with SOAP-based web services to obtain data on E85 and Bi-Diesel station locations, and to do other cool stuff like send an SMS message from VoiceXML.
  • To make use of interesting and unique audio files for prompts and to signal specific types of outcomes.

The fruits of one weekend of labor can be downloaded here. To set up and test this application, you will need the following:

  • An account with StrikeIron to use the web services that drive the GreenPhone application.
  • A copy of Voxeo Prophecy.
  • A good headset and microphone (to place test calls using Prophecy).
  • A cell phone (preferably one with a liberal text messaging contract).

Sign Up With StrikeIron:

Create an account with StrikeIron and sign up for the Super Data Pack Web Service. This is a collection of web services that allow for up to 10,000 hits / month at no charge (where are you going to get a better deal than that?). You’ll also want to sign up for the Global SMS Pro Web Service – this is the service that is used to send SMS messages from the GreenPhone application. Note – this service is priced quite differently than the Super Data Pack Web Service – only 10 free hits before you start paying. If you want to use this service for anything more than just testing out how to send an SMS message from Voxeo Prophecy, you’ll need to get your wallet out.

Make note of the user ID (email address) and password used to create your StrikeIron account – these will be needed momentarily.

Download and install Voxeo Prophecy:

Download and install the Voxeo Prophecy software. Follow all of the instructions for installing and obtaining a license – a two-port license (which will support 2 concurrent phone calls) is free. Right now, prophecy only runs on Windows, but a Linux version is in the pipeline.

Download and Configure GreenPhone:

Download the GreenPhone application and extract it to a new directory under c:\{Prophecy install path}\www\. (For example, on my Windows machine I’ve extracted to c:\Program Files\Voxeo\www\GreenPhone\). You don’t have to run the GreenPhone application on the same machine as Prophecy – if you decide to deploy it on another machine, it must support PHP 5 – GreenPhone makes use of the PHP SOAP and SimpleXML extensions.

Once this is complete, navigate to the directory where you just extracted the GreenPhone application files. Go to the directory called “php”, and open the file called common.php. At the top of this file, enter the credentials from your StrikeIron account. Save and close the file.

Creating a Call Route for GreenPhone:

Open the Prophecy Management Console in your web browser (http://127.0.0.1:9995/mc.php) – the default user ID and password are admin/admin. Click on the “Call Routing” option on the left hand menu – this is where you will set up a call route to the GreenPhone application.

Pick one of the numbered route Ids (e.g., Route 1 ID) and make the following changes:

  • Change the route ID to green
  • Change the Route Type to CCXML W3C
  • Change the URL to http://127.0.0.1:9990/{ GreenPhone Install Directory}/greenPhoneStart.xml
  • Scroll to the bottom of the page and click “Save Changes”

Making a test call:

Now that Prophecy is installed, fire up the SIP Phone that it is bundled with – you should see the Prophecy icon in your system tray. Click on it, and select “SIP Phone” from the menu. When the SIP Phone launches, select Options. In the SIP Proxy / Registrar Options section, enter your cell phone number in the Local Username field (e.g., 2125551234). Click OK, and restart your SIP Phone. This last step allows your cell phone number to be delivered as the caller ID (or ANI) on the test call you are about the make, even though your initiating the call from a SIP phone.

GreenPhone is built to use ANI to look up E85 and Bio-Diesel stations in the caller’s home state. We do this by invoking the U.S. Area Code Information Web Service that is part of the StrikeIron Super Data Pack to determine which state a caller is calling from. There are additional web services in the StrikeIron Super Data Pack that we can invoke to locate Bio-Diesel stations and E85 Stations — the methods invoked on these last two services require us to identify the state we want a listing of stations for.

The caller’s ANI is also used to send the details on a particular E85 or Bio-Diesel station via text message to the caller’s phone – so if you enter your cell phone number in the Voxeo SIP Phone as described above, you can get details on a station that may be near you sent directly to your cell phone.

As an aside, you’ll notice that a single phone call can result in up to 4 web service invocations — not really sure if that’s “too many” but there are probably some opportunities for caching that I’ll be discussing in the next couple of posts on this, as I describe in more detail how to interact with web services via Voxeo Prophecy.

Now you are ready to place a test call. When your SIP Phone restarts, go to the field called Dial String and enter “sip:green@127.0.0.1” (without the quotes). Click dial and you are now interacting with the GreenPhone application!

You’ll notice (and hopefully enjoy) the unique sounds I’ve tried to used throughout the application. All of them were obtained from the FreeSound Project and modified to conform to the Prophecy standard for audio files with Audacity.

There are some obvious limitations to how this application currently works, and the VUI clearly needs some refinement (DTMF only at this point).

In the next several posts, I’ll point to this application to discuss examples on how to accomplish things in VoiceXML and CCXML using the Voxeo Prophecy Platform.

Have a happy Earth Day on 4/22!!

Creating dynamic voice apps that use Google’s App Engine, part 2

Friday, April 18th, 2008

googleappengine.jpgJust a brief update on my piece about how to use voice with Google’s App Engine… I now have been invited into the preview of Google’s App Engine, so I’ve created a second application on our Evolution platform with this one pointing to my shiny new GAE app at:

http://voicexmltest2.appspot.com/

For now, it’s the same (lame) python code, but the app does have it’s own phone numbers:

Direct Local # (857) 362-8430
PIN Access (800) 289-5570 then PIN: 9996075378
PIN Access (407) 386-2174 then PIN: 9996075378
Skype VoIP +99000936 9996075378
FWD VoIP **86919996075378
SIP VoIP sip:9996075378@sip.voxeo.net

Why didn’t I simply modify the original Evolution application? Well, I could have, but I figured this way I can also still experiment with the AppDrop.com application as well.

The nice thing with a true Google App Engine account is that I can use “appcfg.py” to automagically update the files up on the App Engine site, which is very slick. (Thanks, Google, for the invite!) Next week, my intent is to work with the python code to get it actually manipulating the XML… stay tuned… (and check my original article if you would like to experiment with voice and Google App Engine, too).

Technorati Tags:
, , , , , , , , , , , ,

Creating dynamic voice apps that use Google’s AppEngine (and Amazon’s EC2 via AppDrop), part 1

Tuesday, April 15th, 2008

Update 2008 Apr 18: I now have an App Engine account and have the app running there as well.


googleappengine.jpgCould Google’s AppEngine, the buzz of the developer world right now, be used to create voice applications? That was the question I asked myself and the answer is something you can test yourself by calling one of these numbers (since we support all these forms of calling):

Assuming I haven’t changed the app since writing this post, you should simply hear some text “Your magic number is (some number)…” and then a pointer to this blog site. If you call the number again, you’ll hear the text with a different number. And again and again.

So what’s going on here? Well, you have:

Got all that? :-) Let’s take a look at how the pieces fit together

Google’s AppEngine

Last week Google announced the release of their AppEngine developer tool that lets you run web applications on Google’s massive infrastructure. Basically, you develop your application (currently only in the python language), upload it to Google’s servers and let Google’s infrastructure take care of all the rest. In addition to their web server and distributed database, you also have access to using Google Accounts for usernames which means you don’t have to create your own usernames and passwords (although you are then obviously tied to Google for authentication).

This announcement was immediately the subject of a huge amount of buzz throughout the blogosphere and the web in general. Unfortunately, only the first 10,000 developers could get in to try it out. While they let 10,000 more developers in yesterday, I’m still not one of them. However, I could download the AppEngine SDK and, dusting off my rusty memory of python (which I used to do a lot with) put together a quick program that would dynamically generate a VoiceXML file.

If your browser will display an XML file as raw XML, you can see the result here:

http://voicexmltest.appdrop.com/

If your browser won’t display raw XML, here’s a screenshot of what it should look like:
voicexmlmagicnumberex1.jpg

Your number should be different because it is being randomly generated. If you refresh your browser you should now see a different number in the <prompt> text. I did all this locally using the AppEngine SDK, saw that it generated valid (and dynamic) VoiceXML, and then needed to figure out where to host this so that I could demonstrate the usage - without a Google AppEngine account.

Amazon’s EC2 and AppDrop.com

While I’ve been patiently waiting for Google to send me an email letting me into the AppEngine preview, I stumbled upon the news that Google’s App Engine had been ported to Amazon’s EC2 service. Developer Chris Anderson announced a new service called AppDrop.com which basically hosts a modified version of Google’s AppEngine code on Amazon’s EC2 service. It obviously doesn’t support all of Google’s services like the distributed database or Google Accounts for authentication, but it allows you to develop an app with the AppEngine SDK and then upload it to the AppDrop service. Philosophically, I found this an interesting demonstration that using the AppEngine SDK did not necessarily lock you in to Google’s platform.

More importantly to me, I could create a hosted AppDrop.com application now without yet having my Google AppEngine account (subject to the caveats that AppDrop.com is entirely a proof-of-concept, it may go away at anytime, etc, etc.). So I did. Here’s the steps:

  1. Download the modified Google AppEngine SDK from AppDrop.com. It’s essentially the same thing as the Google SDK but with modifications for the lack of support for Google’s database, accounts, etc.

  2. Create an app in the local SDK. (I just literally copied my files from the Google AppEngine SDK directory over into the AppDrop SDK directory.)
  3. Register an app on AppDrop.com. This was not entirely intuitive as nowhere on the main page does it tell you how to do this. However, if you go to the list of current applications, there is a “Make a new app” link that takes you through the process of creating a new account on the site and then registering the app.
  4. Create a gzipped tar file of your application to upload. Easiest way is on the command line go into the directory containing your app and type “tar -cvzf <appname>.tar.gz *“. (This is taken care of for you in the actual Google environment with “app_cfg.py”, but no such automated script is available for AppDrop.com yet, although they are working on one.)
  5. Upload the file via the AppDrop.com web interface. If it works you get a simple “Upload successful” page.
  6. Go to http://<appname>.appdrop.com/ to see the resulting app. For instance, http://voicexmltest.appdrop.com

With that, I now had a publicly hosted app developed on the Google AppEngine SDK.

Voxeo’s Evolution platform

The last step was to provide the phone interface to this voice app and for that I obviously used our Evolution platform. Using my free developer account (which anyone can get), I basically followed the steps of our Quick Start Guide:

  1. Once logged into Evolution, I clicked on “Application Manager”.
  2. I clicked on “Add Application”.
  3. For the development platform, I chose “Prophecy 8.0 - VoiceXML 2.1″
  4. On the next screen, I completed the form as follows:
    1. Entered a name of the application.

    2. In the “Start URL 1″ box, I entered the URL to my AppDrop.com application - including the trailing slash (very important!)
    3. Under “Application Phone Number” chose the region in which I wanted a phone number. (If you don’t do this, you can still access your application via SIP or Skype, but not direct-dial from the PSTN. And yes, for the free accounts we only give out US phone numbers.)
  5. I clicked the “Create Application” button and my application was done.

Here’s a screen shot of that part of the screen for my app:
evolutionappdropexp.jpg

Do note again that I included the trailing slash on “http://voicexmltest.appdrop.com/”.

If I click on the “Phone Numbers” tab I can see the various numbers that I can use to call into this application:
evolutionappdropexpnumbers.jpg

That’s it! Assuming you are generating correct VoiceXML code, when you call into those numbers you should hear the voice app you created.

Show Me The Code!

So if you’ve read this far I imagine you probably want to see the actual code I wrote, eh?

Well, there are two files needed in a Google AppEngine SDK directory. First you need a file called “app.yaml” that simply provides the name of your app and the name of the main python file (here brilliantly called “main.py”):

application: voicexmltest
version: 1
runtime: python
api_version: 1

handlers:
- url: .*
  script: main.py

The second file is the “main.py” python file (which could be called anything as long as it matches what is in the app.yaml file). To be honest, my file is embarrassingly lame as a python app goes but as long as you recite “Remember this was a quick lunch-time hack as a proof of concept”, here it is:

#!/user/bin/env/python
import random

def main():
  print '''<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>
    <prompt>
'''
  print "Hello! Your magic number today is %d. " % random.randrange(100)
  print '''
     To learn more about this application, please visit blogs dot voxeo dot
     com. Thank you for calling this app.
    </prompt>
    </block>
  </form>
</vxml>
'''

if __name__ == "__main__":
  main()

Like I said, it’s a very simple app that really doesn’t even remotely begin to demonstrate the power of either python or VoiceXML. (Remember “quick proof-of-concept”.) For those not familiar with python or VoiceXML, here is what the main() function does:

  • prints out the first half of the VoiceXML file

  • prints out the dynamically generated part with a random number
  • prints out the second half of the VoiceXML file

Basically a glorified “Hello, world!” program with a random number thrown in. If you’re not familiar with the python construction I’m using here to print these large blocks of text, what I’m doing is putting three single apostrophes together to mark the beginning of the block of text and then three single apostrophes at the end to close off the block. It’s a great way to do simple hacks like this one. :-)

Next Steps

So with the embarrassment of that python file out of the way and now that I’ve proven to myself that you can develop voice apps using Google’s AppEngine (at least via AppDrop), what’s next? Well, when I next get a chance to work on it, I’m going to do two things. First, I’ll do more with VoiceXML, since there’s so much more it can do. Maybe add in some speech recognition, multiple call paths, maybe even upload some small audio files for prompts. Perhaps I’ll play with some outbound dialling and call transfer. Or wrap the VoiceXML inside of a Call Control XML (CCXML) app.

Second, and perhaps more useful to demonstrate Google’s AppEngine, I want to change the python code to actually manipulate the XML rather than simply printing out the XML code as text. There’s a great amount that can be done with python and XML, and even a book on the subject (which happens to be on my bookshelf), so the resources are out there.

In the meantime, though, I would be curious to see what any of you all can do with these pieces. What do you think about using Google’s App Engine to host applications in this way? If you can easily use a language like python to manipulate data and dynamically generate voice apps (using VoiceXML), what can you think of to do with it? What kind of apps will you write?

Do any of you want to give it a try? The steps are really quite simple:

  1. Download either the Google App Engine SDK, or, if like me you don’t have a Google App Engine account yet, download the AppDrop.com modified SDK.
  2. Write your python app locally (Google has intructions and examples) that generates valid VoiceXML (we have tutorials there).
  3. Upload the app to either Google or AppDrop.com.
  4. Create an Evolution account if you don’t have one and follow the steps above to create the app on our service.
  5. Call it up and see how it works.

If you do anything cool with it, please definitely do leave a comment as I’d love to check out what you do (and hey, maybe even spotlight it here on the blog if you are open to that).

With that, I’m going to turn back to experimenting with python and XML… (oh, and waiting for my App Engine invite… hint, hint, Google!)

Technorati Tags:
, , , , , , , , , , , , , , ,

Certified Tech Tip: Using SISR-formatted grammar returns with Prophecy 8

Monday, January 28th, 2008

voicexmlcertifieddeveloper.gifI am happy to announce a new semi-regular addition to the Voxeo blog, where the Voxeo Support team will be adding VoiceXML, CallXML, and CCXML tips, tricks, and best practices for our developers, which we will christen as “Certified Tech Tips”. The name has a nice ring to it and all, but this isn’t just for show: 100% of the technical support team are certified VoiceXML developers, and we are pretty proud of being the only provider who holds these standards.

As we devise some really inventive means of achieving project goals & cool functionality when coding in the framework of these various IVR markups, we thought that we might share some of these tips to our readers of the Voxeo Blog.

For those who haven’t interacted with the support team yet, a bit of introduction is in order. My name is Matthew Henry, and serve as the Director of Customer Support here at Voxeo. I have been with the company since it’s inception (way back in the 20th Century), and have been lucky enough to work with a sizable number of really talented IVR developers and engineers, which has allowed me to learn a lot, and has also allowed me to build up a respectable code library for all things IVR. And now, it’s time for some payback.

=^)

As our maiden posting to the Voxeo blog, we will cover the topic of Semantic Interpretation for Speech Recognition-formatted grammar returns when using the Prophecy 8 software. A lot of folks are used to using plain-old Nuance GSL grammars due to it’s ease of use and concise markup, but the drawback of using this approach is pretty fundamental: As GSL is Nuance-specific, it isn’t guaranteed that every provider will support it. And those of us who have written complex grammars know that porting a grammar can be a tedious job to take on. For this reason, we always suggest that folks stick with a W3C standard when writing grammars, that being using the SRGS XML-based grammar format that leverages the SISR syntax to populate our grammar interpretations back to the VoiceXML dialog. Most of the documentation on our site references using the Nuance-specific return formatting, and today we will show you what a 100% w3c compliant grammar looks like.

To start things off, let’s take a look at some GSL, and some SRGS with Nuance-specific returns for the sake of comparison:

Simple GSL

MYRULENAME [
[utterance]      {<mySlotName  “my return value”>}
]

Simple SRGS with Nuance-returns

<?xml version= "1.0"?>
    <grammar xmlns=”http://www.w3.org/2001/06/grammar” xml:lang=”en-US”
        root = “MYRULENAME”>
      <rule id=”MYRULENAME”>
        <one-of>
          <item>
   	       utterance
	       <tag> <![CDATA[  <mySlotName "my return value"> ]]>  </tag>
          </item>
        </one-of>
      </rule>
    </grammar>

Simple SRGS with SISR returns

<?xml version= "1.0"?>
    <grammar xmlns=”http://www.w3.org/2001/06/grammar” xml:lang=”en-US”
          root = “MYRULENAME”>
      <rule id=”MYRULENAME”>
        <one-of>
          <item>
               utterance
	       <tag>$.mySlotName = “my return value”</tag>
	  </item>
        </one-of>
      </rule>
    </grammar>

The differences in syntax are fairly self-evident in these cases. In the case of SISR, the “$.” prefix allows us to specify any slotname that we will return to our VoiceXML dialog, and specifying a quoted interpretation value preceded by an ‘equals’ sign links the value to this slot.

In addition, we can also specify a “generic” return where no slotname is specified (which comes in handy for subgrammars) by putting $=”my return value” within the . If we want to get really fancy, we can even specify multiple slots to return back to the dialog by inserting a “;” delimiter between the slot/interpretation pairing. A sample multislot return with an “anonymous” slot also defined might look something like this:

<item>
    utterance
    <tag>
    $ = “my anonymous slot value”;
    $.mySlotName1=”my slot 1 return”;	 $.mySlotName2=”my slot 2 return”;
    </tag>
</item>

As you can see, the SISR returns are much more concise, easy to read, and much more lightweight than Nuance-specific returns. And once you write a grammar using SIRGS and SISR, then any Certified Compliant VoiceXML platform will run these grammars without any porting at all being required.

If you found this posting useful, then let us know! Mayhap we will dig deeper into this the next time, and whip out some more complex subgrammars to better illustrate the usage of SISR formatting within your IVR applications.

Till next time!
~Matthew Henry