Archive for the ‘Grammar’ Category

Developer Jam Session webinar – Sept 16 – Advanced Speech Grammar Mgmt with Nugram IDE

Friday, September 11th, 2009

jam_session_275.jpg

Are you interested in developing advanced speech applications? Would you like to learn about new tools to help in the rapid development of grammars for speech apps? If so, join in to our free Developer Jam Session webinar on this coming Wednesday, September 16, 2009:

Topic: Advanced Speech Grammar Management with the Professional Edition of NuGram IDE

Date: September 16th, 2009
Time: 8am PDT, 11am EDT, 5pm CEST

Speakers:
Tobias Goebel, Sr. Presales Consultant, Voxeo Germany
Dominique Boucher, Product Manager – NuGram, Nu Echo

Abstract: Following up on the introduction of NuGram IDE for grammar engineering and management back in September 2008, this session will provide an update on the latest developments around this grammar tool offering.

We will introduce the new professional edition, explain how it extends the free basic edition, give a demo of the tool and show how it integrates with VoiceObjects Desktop for Eclipse. The basic edition will be bundled with VoiceObjects (see also here) to have an integrated solution for grammar engineering within the VoiceObjects service creation environment.

The jam session will be closed with a NuGram product roadmap and an outlook for 2009 and 2010.

sign-up-now.gif

If you can’t attend on Wednesday, the session will be archived on our Monthly Jam Sessions web page for later viewing.


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Adding Call Control to Voxeo Designer Applications via CCXML

Wednesday, November 26th, 2008

voicexmlcertifieddeveloper.gif

Lately, the Voxeo Designer platform has become increasingly popular. However, it can be somewhat limited in terms of call control. Customers who are used to a combination of CCXML and VoiceXML may be turned away by this idea. Well, fear not, as we can easily integrate CCXML and Designer now, adding that game-changing call control aspect that CCXML brings to the table. Why use a CCXML front end, you say? Well, lots of reasons. CCXML brings call control, conferencing, whisper dialogs, hold dialogs, as well as the ability to pass in parameters to your Designer application. Whether it’s “you have a call from John Smith, press 1 to accept this call,” or a simple repeating hold music dialog, it is now possible with Designer dialogs.

So now let’s take a look at how we put it all together. Let’s assume that you already have a basic Designer application established. At this point, we don’t care what it does. Now, we need to write a CCXML front end for it.

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0" xmlns:voxeo="http://community.voxeo.com/xmlns/ccxml">
  <var name="myDialogID"/>
  <eventprocessor>

    <transition event="connection.alerting">
      <log expr="'Preparing to answer the call.'"/>
      <accept/>
    </transition>

    <transition event="connection.connected">
      <log expr="'Caller has connected.  Executing VoiceXML dialog now.'"/>
      <dialogstart src="'helloworld.vxml'" type="'application/voicexml+xml'" dialogid="myDialogID"/>
    </transition>

    <transition event="dialog.exit">
      <log expr="'The dialog is now complete.  Exiting application.'"/>
      <exit/>
    </transition>

    <transition event="error.*">
      <log expr="'An error has occured (' + event$.reason + ').  Exiting application.'"/>
      <exit/>
    </transition>

  </eventprocessor>
</ccxml>

This simple CCXML code snippet accept a user’s call, and launches a VoiceXML dialog titled helloworld.vxml. Now, all we need to do is modify the “src” attribute to reference the Designer application. How, you ask? Well, it’s a little tricky, but far from impossible. All we need to do is grab the Designer URL from the Application Debugger (or the Prophecy Log Viewer, for all the local installations out there). The full URLs will look something like this, and may require copying the link to the clipboard (in Evolution, at least) in order to see the entire thing:

Evolution Designer:

http://evodesigner-speech-dev.voxeo.com/SpeechRuntime/route.speech?vr.application.id=XXXX (where XXXX is your specific Application ID)

Prophecy Designer:

http://127.0.0.1:9992/SpeechRuntime/route.speech?vr.application.id=X (where X is your specific Application ID)

We can then inject this into the <dialogstart> keeping everything else the same. The dialog “type” remains “application/voicexml+xml” since Designer is pure VoiceXML behind the scenes, coupled with a custom GUI.

Now let’s take a look at the finalized code, with our Designer dialog in place.

<?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0" xmlns:voxeo="http://community.voxeo.com/xmlns/ccxml">
  <var name="myDialogID"/>
  <eventprocessor>

    <transition event="connection.alerting">
      <log expr="'Preparing to answer the call.'"/>
      <accept/>
    </transition>

    <transition event="connection.connected">
      <log expr="'Caller has connected.  Executing Designer dialog now.'"/>
      <dialogstart  src="'http://127.0.0.1:9992/SpeechRuntime/route.speech?vr.application.id=4'"  type="'application/voicexml+xml'" dialogid="myDialogID"/>
    </transition>

    <transition event="dialog.exit">
      <log expr="'The dialog is now complete.  Exiting application.'"/>
      <exit/>
    </transition>

    <transition event="error.*">
      <log expr="'An error has occured (' + event$.reason + ').  Exiting application.'"/>
      <exit/>
    </transition>

  </eventprocessor>
</ccxml>

And there you have it — a basic application which launches a Designer dialog. Hopefully this will help shed some light on how developers can combine the utility of CCXML with the usability of Voxeo Designer.

Jeff Menkel VXML Certified Developer


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Certified Tech Tip: Alpha-Numeric voice recognition grammars – part two

Tuesday, May 20th, 2008

voicexmlcertifieddeveloper.gif In our last entry to the tech-tips blog, we detailed the challenges inherent in capturing alphabetical, or alpha-numeric entries from our callers, and detailed several paths for minimizing the chance of mis-recognition when implementing input fields based on these two categories of voice recognition. The long and short of this posting was that IVR developers should refrain from attempting this wherever possible, and to instead try these alternatives:

  • Pre-compiled Statistical Language Model grammars

  • Leveraging TargusInfo services for advanced recognition accuracy

However, the IVR project requirements dictate what we can, and can’t do as developers, so in some cases, we have to try and whip out a user grammar that takes alpha, or alpha-numeric input. As mentioned in our last blog entry, there are a few things we can do to stack the deck to try and squeeze more accuracy out of these grammars so that we don’t end up with frustrated callers, but the plain truth is that we will never, ever be able to write a grammar that accepts alphabetical characters to be 100% accurate using todays recognition technology. What we will do today is twofold:

(1) Craft an SRGS+SISR subgrammar for alphabetical, and numeric characters

(2) Plug this grammar into a mixed-initiative form dialog that will minimize (but not fully address!), the possibility for mis-recognitions.

Those developers who have the need for such a grammar and dialog within their production-grade applications are advised to take this basic framework as a starting point, and then expand on it by:

(a) Test carefully with a broad range of users, and to fully flesh out alternate utterance values for alphabetic characters

(b) Apply item weighting to specific characters based on the probability of a given character versus another like-sounding character – this will depend greatly on the specific usage of the grammar

(c) Track results by using w3c-compliant utterance recording, and logging all shadow variables, so that these results can be used to further tune and tweak our grammar for maximum accuracy

(d) Consider using n-best post-processing as an additional confirmation step to ensure that the results we receive are indeed accurate

For today’s entry, lets assume that we need to track a three digit zip code, which are prevalent in Canadian locales. Our predefined format for utterance values are “Alpha Digit Alpha”, and luckily, not all alpha characters are applicable: Instead of trying to recognize 26 letters accurately, we only need to recognize 16, which helps a lot!

We won’t dig into the specifics of a mixed-initiative form dialog, as we have already done so in our mixed-initiatve tutorial, but the gist is that this feature of VoiceXML allows us to fill multiple fields with a single utterance, and breaking up each alpha and numeric character into it’s own recognition field greatly cuts down on disambiguation problems that can occur.

For the purposes of brevity, what we have below is a stripped-down version of our fully fleshed-out grammar, but you may download the full grammar, and the mixed-initiative dialog right here, which contains lots more inline notations.

<?xml version= "1.0"?><grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US">

<rule id="canadianZip" scope="public">

<one-of>

<!-- ALL THREE FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item><!-- ONLY TWO FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<!-- ONLY ONE FIELD FILLED  -->

<item>

<ruleref uri="#alphaRule1"/>

<tag></tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</one-of>

</rule>

<rule id="alphaRule1" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ex</item>

<item> ax</item>

<item> x </item>

</one-of>

<tag>out.alphaSlot1="X"; </tag>

</item>

</one-of>

</rule>

<rule id="numRule" scope="public">

<one-of>

<item> one <tag>out.numSlot="1"; </tag>  </item>

</one-of>

</rule>

<rule id="alphaRule2" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ay</item>

</one-of>

<tag>out.alphaSlot2="A"; </tag>

</item>

</one-of>

</rule>

</grammar>

In brief, our top-level rule assumes that we can have any of the following entries:

"X1A""X"

"X1"

"XA"

"1"

"1A"

And in the event that we get one or two characters matched in our utterance, the VoiceXML mixed-initiative logic will then take over, and prompt the caller to fill in any “blanks” remaining.

A few things of note about the grammar defined below is that in the event that we receive only a single alpha utterance, we will assume that it is the first character, not the last. Additionally, when we construct a grammar that contains multiple slot returns, it is required that we explicitly define the slot values all the way up the chain: if we didn’t define the “out.[slotname]=rules.[rulename].[subslot]” within the context of the top-level rule, the last slot value would overwrite all others, meaning that we would only get a value for “alphaSlot2″ within the VoiceXML dialog. To illustrate even further, the below snippet for a top-level return would make this a reality:

<item> 
<ruleref uri="#alphaRule1"/>

<ruleref uri="#numRule"/>

<ruleref uri="#alphaRule2"/>

</item>

You’ll also see that each possibility for character recognition is specified within the top-level rule, so in the event that we get 1, 2 or 3 character strings, we can pipe the return value back to the VoiceXML, and let the mixed-initiative dialogs then access the sub-rules (alphaRule1/2 and numRule), individually as needed.

We also illustrated in brief how one can define multiple like-sounding utterance values that return the same interpretation value, and defined an for our alphaRule1 entry simply to show how this can be done: The task of taking this framework, and turning it into a grammar that satisfies any given project rests in the hands of you, the capable IVR developer.

=^)

Till next time,

Matthew Henry Director of Customer Support Voxeo Corporation

Useful Links

Technorati Tags: , , , , , , ,


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Certified Tech Tip: Alpha-Numeric voice recognition grammars – Part One

Monday, May 5th, 2008

voicexmlcertifieddeveloper.gif

Quite often, the topic of how a developer should construct alphabetical “spell-out” grammars, or how one can best create an alpha-numeric recognition grammar is posed to the support team at voxeo. Many a posting to our VoiceXML developer forums has touched on this subject, but we haven’t really delved into this in precise detail to explain exactly why this is such a challenge until now.

“Alphabetical recognition is a challenge?”, you ask? You bet it is, if you want to get any semblance of accurate recognition results. And when we throw alpha characters, and maybe some numeric characters within the same utterance string, then we are really looking at a difficult grammar to get tuned to a point where it is usable.

So whats the big deal, anyhow? The inherent problems with spelled input recognition is best illustrated by a simple anecdote:

Imagine that you are at a restaurant on a busy Friday evening, and waiting for your table. While in the lobby, there are people chatting, children cavorting about, and harried workers trying to seat the flood of diners. At the same time, your friends who are joining you for dinner call to say that they are lost, and ask for directions to the restaurant. Amidst all the background chatter, glasses clinking and the rest of the noisy distractions, how many times do you have to repeat “From I-95, get off on exit 76B, and then take a left at Montana street” before your buddy is able to accurately understand what you are saying? In this worst case scenario, at best you may have to repeat yourself only once. Even if the restaurant was dead empty, and as silent as a tomb, the chances for your pal misunderstanding “exit 76B” for “exit 763″ or something similar is not only quite plausible, but highly likely.

The root of the problem with alpha grammars, and even more so with alpha-numeric grammars is the staggering chance of disambiguation of like-sounding matches: “B” sounds like “C”, sounds like “Z”, sounds like “E”, sounds like “three”, and “M” sounds like “N” sounds like “ten”……you get the picture. And this is for a *single character match* only: To further illustrate the challenges that we face, consider the fact that a 1-character alphabet grammar has only 26 possible results. But a 7-character grammar would have over eight BILLION possibilities. As you can imagine, the amount of possible results for an alphagram of arbitrary length is simply staggering.

Suggestions for alphabetical voice reco: Alternative Options Firstly, constructing a user-defined alphabet grammar is something that we don’t really recommend attempting for “spell anything” applications, as the plain, unvarnished truth is that todays voice recognition technology is simply not up to the task. To be certain, improvements in ASR technology over the past few years has seen dramatic improvements, but not so much as to allow us to spell, or say just any old utterance and expect accurate match results. In a lot of cases, a Statistical Language Model grammar will do the job, assuming that you expect your callers to input certain types of input, such as a first name, a city name, or a state name.

While this isn’t the time or place to cover SLM grammars in depth, a brief summary should explain the strengths of these pre-compiled, pre-tuned grammars. SLM grammars in the context of spelling are essentially designed to fill in the blanks when we have partial input, using predetermined logic that is tailored to the input context/category. For instance, assume that we have an SLM firstname grammar active (note that these are available when using the Prophecy + Nuance platform on the evolution.voxeo.com portal), and our spelled utterance from the caller reads like what we have below, where unrecognizable utterance fragments are represented by a question mark:

“C O R ? E L I ? S”

Using the pre-tuned logic that is part of the SLM grammar, the ASR will determine that there are no firstname matches that read as “Coraelias”, or “Corbelibs”, etc: It will make the decision that the only first name that matches this pattern where some fragments of the utterance are missing would be “Cornelius”: This is the gist of how SLM grammars work, and if your project allows you to use somewhat narrower categories for any utterance you want to recognize, then using a predefined SLM grammar, or even crafting your own SLM grammar is a better way to go than trying to make a flat-file alphabetical SRGS file.

One of the common tasks for alphabetical grammars seems to be the capture of names, or street addresses, and if this is the case, there is a very accurate add-on service that can handle this task rather nicely. The TargusInfo feature allows developers to access one, or both of these two services:

* Name & Address lookups based on Caller ID * Pre-tuned name & full address grammars

These services are remarkable in terms of Caller ID-to-Address accuracy, and the name/address grammars are top-notch, and quite acceptable for full scale, enterprise deployments as well. The only caveats to using this is that this service is limited to the United States only, and there is an applicable per-transaction fee to use this in a production capacity. However, we can honor developer requests to test drive this service by allowing a 30 or so hits to this service at no charge. Developers interested in this service can login to their evolution.voxeo.com accounts, and create an account ticket requesting access to this service to see just how good it is. And trust me on this one: You’ll be mightily impressed, and more importantly, so will your callers.

If you gotta do it… In the event that the SLM grammar option, or the TargusInfo option won’t fit the bill for your IVR project, then you may well be forced to try and craft a flat-file Alpha Grammar using w3c-compliant SRGS/SISR syntaxes. If you do fall into this category, we can give you some advice on doing so, with the full disclaimer that Results May Vary, and that 100% recognition accuracy using this methodology is Science Fiction, at least for the time being.

* Start small by testing one-character strings so that you can tune and tweak utterance values in the grammar.

* Track user utterance, and confidence scores via “lastresult$” shadow variables for post testing analysis, and as a basis for what needs to be tuned.

* Leverage the VXML 2.1 utterance recording via the “recordutterance” setting, and save off all user recording data for post-call analysis.

* Flesh out utterance values by phonetically sounding them out: For instance, “a” could be represented by:

a ay eh

* Try to get as broad a user base as possible for testing, else you run the risk of tuning your grammar to a small subset of user speech patterns. If you have but a single grammar tester who happens to have a Deep South accent, then the tuned grammar will likely not be much good to callers in New York, or our friends in the UK.

* After each round of changes that you apply to your grammars, test them thoroughly, analyze the results, and then test them again. Then test once more just to be sure of your results.

* Careful use of grammar weighting can really save the day for like-sounding characters. The chances of a user utterance of “E” is much higher than one of “Z”, but be very careful when applying weights, as it is possible to go overboard when doing so, and weight your grammar to hard in favor of one particular letter, which will then skew your recognition results and accuracy.

* Consider using n-best post-processing when overall recognition confidence scores are below a certain threshold: It’s much better to take the extra step to get confirmed accuracy than to assume wrongly.

* For utterance strings that are static in length, implementing a mixed initiative dialog can be an excellent tactic to cut down on the disambiguation factor that skyrockets when the string length grows in size. This can be a tricky project to get right, but it is one that is well worth the effort in development.

Next TechTip: In the next certified tech tip from the Voxeo support team, we will illustrate our last suggestion in detail. That’s right, we will take on the task of posting and dissecting a mixed initiative dialog, and the associated alphanumeric grammar that could accept Canadian zip code input. As we stated before in non-nonsense terms, this is possibly one of the hardest, if not *the* hardest things that a developer can attempt to do reliably, but as you are well aware, the Voxeo team is quite fearless, and doesn’t respect the concept of “impossible”.

Till next time,

Matthew Henry Director of Customer Support Voxeo Corporation

Useful Links Statistical Language Model Grammars Nuance Grammar Developers Guide Mixed-Initiative dialog tutorial SRGS Grammar Specification: Grammar weighting SISR Grammar Specification VXML 2.0 specification: The LastResult array VXML 2.1 specification: Utterance Recording


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Certified Tech Tip: Multi-slot SISR subgrammars with Prophecy 8

Monday, February 11th, 2008

voicexmlcertifieddeveloper.gif

For our last certified tech-tip, we explored the older SISR-formatted returns that one can use when designing a recognition grammar for a VoiceXML application. For this week, we will tackle two different things related to SISR grammars when using the Prophecy 8 software:1 – How to use the newer SISR “.out” grammar return syntax2 – How we can craft a subgrammar returning multiple slot valuesIt would seem as if the second item is pretty elementary, but it does bear a little bit of illustration, especially in terms of how the values in the sub-rules will bubble up to the top-level return. For the sake of simplicity, we will do a much-simplified month/day grammar that contains a single entry. One you grasp the syntax, you can easily flesh this out more fully to include all possible months & days, or even overhaul it into a first-name & last-name grammar.Let’s take a peek at the grammar file itself, and then look at the relevant working parts: <?xml version= “1.0″?>

<!DOCTYPE grammar PUBLIC “-//W3C//DTD GRAMMAR 1.0//EN” “http://www.w3.org/TR/speech-grammar/grammar.dtd”>

<grammar mode=”voice” xmlns=”http://www.w3.org/2001/06/grammar” xml:lang=”en-US” version=”1.0″ root=”TOPLEVEL” tag-format=”semantics/1.0″>

 <rule id=”TOPLEVEL”>

  <one-of>

   <item>

   <item>

    <ruleref uri=”#MONTH”/> <tag>out.monthslot=rules.MONTH.monthsubslot;</tag>

   </item>

   <item>

    <ruleref uri=”#DAY”/> <tag>out.dayslot=rules.DAY.daysubslot;</tag>

   </item>

    <tag>out.yearslot=”2008″;</tag>

   </item>

  </one-of>

 </rule>

 <rule id=”MONTH”>

  <one-of>

   <item> january <tag> out.monthsubslot=”January “;</tag> </item>

  </one-of>

 </rule>

 <rule id=”DAY”>

  <one-of>

   <item> first <tag> out.daysubslot=”first”;</tag> </item>

  </one-of>

 </rule>

</grammar> The grammar above consists of two sub-rules titled “MONTH” and “DATE”, and we have a single top-level rule titled “TOPLEVEL”. The sub-rules each specify a month and day slot respectively, and these slots will bubble up to the to-level when we invoke the syntax that we have below:

<ruleref uri=”#SUBRULE”> <tag>out.slotname.SUBRULENAME.subslot;</tag>

And we then reference these various slots within the VoiceXML as follows:

<log expr=”‘*** SLOT RESULT = ‘ + lastresult$.interpretation.slotname”/>

We also threw in a quick example of using the “out” syntax in a more generic manner for our “yearslot” value. In this case, it simply allows us to return a year value back to the VoiceXML from the top-level rule as opposed to having to reference the sub-rule values.This is added in to show how a “flat-file” non subgrammar can return a slot value using the newer SISR syntax that follows this format:

<tag>out.slotname;</tag>

So when using our month/day grammar above, one might still be unclear on how we get at all these slot values within the VoiceXML dialog. Once recognition has occurred, we would specify something like this:

<log expr=”‘*** YEAR RESULT = ‘ + lastresult$.interpretation.yearslot”/>

<log expr=”‘*** MONTH RESULT = ‘ + lastresult$.interpretation.monthslot”/>

<log expr=”‘*** DAY RESULT = ‘ + lastresult$.interpretation.dayslot”/>

Eventually, we will get around to creating some fully fleshed-out additions to our VoiceXML documentation on the subject of SISR grammars, but until then, we will post any cool tricks and tips here for the edification of our developers. As always, if heres anyone who’d like to see a posting, or techtip on a particular subject, just drop us a line, and we would be happy to accommodate.Next TechTip: Using SSML markup within a CallXML 3.0 application. Stay tuned to the blog; this next one is really cool.

~Matthew Henry


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Certified Tech Tip: Using SISR-formatted grammar returns with Prophecy 8

Monday, January 28th, 2008

voicexmlcertifieddeveloper.gifI am happy to announce a new semi-regular addition to the Voxeo blog, where the Voxeo Support team will be adding VoiceXML, CallXML, and CCXML tips, tricks, and best practices for our developers, which we will christen as “Certified Tech Tips”. The name has a nice ring to it and all, but this isn’t just for show: 100% of the technical support team are certified VoiceXML developers, and we are pretty proud of being the only provider who holds these standards.

As we devise some really inventive means of achieving project goals & cool functionality when coding in the framework of these various IVR markups, we thought that we might share some of these tips to our readers of the Voxeo Blog.

For those who haven’t interacted with the support team yet, a bit of introduction is in order. My name is Matthew Henry, and serve as the Director of Customer Support here at Voxeo. I have been with the company since it’s inception (way back in the 20th Century), and have been lucky enough to work with a sizable number of really talented IVR developers and engineers, which has allowed me to learn a lot, and has also allowed me to build up a respectable code library for all things IVR. And now, it’s time for some payback.

=^)

As our maiden posting to the Voxeo blog, we will cover the topic of Semantic Interpretation for Speech Recognition-formatted grammar returns when using the Prophecy 8 software. A lot of folks are used to using plain-old Nuance GSL grammars due to it’s ease of use and concise markup, but the drawback of using this approach is pretty fundamental: As GSL is Nuance-specific, it isn’t guaranteed that every provider will support it. And those of us who have written complex grammars know that porting a grammar can be a tedious job to take on. For this reason, we always suggest that folks stick with a W3C standard when writing grammars, that being using the SRGS XML-based grammar format that leverages the SISR syntax to populate our grammar interpretations back to the VoiceXML dialog. Most of the documentation on our site references using the Nuance-specific return formatting, and today we will show you what a 100% w3c compliant grammar looks like.

To start things off, let’s take a look at some GSL, and some SRGS with Nuance-specific returns for the sake of comparison:

Simple GSL

MYRULENAME [
[utterance]      {<mySlotName  "my return value">}
]

Simple SRGS with Nuance-returns

<?xml version= "1.0"?>
    <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" 
        root = "MYRULENAME">
      <rule id="MYRULENAME">
        <one-of>
          <item> 
   	       utterance
	       <tag> <![CDATA[  <mySlotName "my return value"> ]]>  </tag> 
          </item>
        </one-of>
      </rule>
    </grammar>

Simple SRGS with SISR returns

<?xml version= "1.0"?>
    <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" 
          root = "MYRULENAME">
      <rule id="MYRULENAME">
        <one-of>
          <item>
               utterance
	       <tag>$.mySlotName = "my return value"</tag> 
	  </item>
        </one-of>
      </rule>
    </grammar>

The differences in syntax are fairly self-evident in these cases. In the case of SISR, the “$.” prefix allows us to specify any slotname that we will return to our VoiceXML dialog, and specifying a quoted interpretation value preceded by an ‘equals’ sign links the value to this slot.

In addition, we can also specify a “generic” return where no slotname is specified (which comes in handy for subgrammars) by putting $=”my return value” within the . If we want to get really fancy, we can even specify multiple slots to return back to the dialog by inserting a “;” delimiter between the slot/interpretation pairing. A sample multislot return with an “anonymous” slot also defined might look something like this:

<item> 
    utterance 
    <tag>
    $ = "my anonymous slot value"; 
    $.mySlotName1="my slot 1 return";	 $.mySlotName2="my slot 2 return";
    </tag> 
</item>

As you can see, the SISR returns are much more concise, easy to read, and much more lightweight than Nuance-specific returns. And once you write a grammar using SIRGS and SISR, then any Certified Compliant VoiceXML platform will run these grammars without any porting at all being required.

If you found this posting useful, then let us know! Mayhap we will dig deeper into this the next time, and whip out some more complex subgrammars to better illustrate the usage of SISR formatting within your IVR applications.

Till next time!
~Matthew Henry


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.