Posts Tagged ‘grammars’

Certified Tech Tip: Alpha-Numeric voice recognition grammars – part two

Tuesday, May 20th, 2008

voicexmlcertifieddeveloper.gif In our last entry to the tech-tips blog, we detailed the challenges inherent in capturing alphabetical, or alpha-numeric entries from our callers, and detailed several paths for minimizing the chance of mis-recognition when implementing input fields based on these two categories of voice recognition. The long and short of this posting was that IVR developers should refrain from attempting this wherever possible, and to instead try these alternatives:

  • Pre-compiled Statistical Language Model grammars

  • Leveraging TargusInfo services for advanced recognition accuracy

However, the IVR project requirements dictate what we can, and can’t do as developers, so in some cases, we have to try and whip out a user grammar that takes alpha, or alpha-numeric input. As mentioned in our last blog entry, there are a few things we can do to stack the deck to try and squeeze more accuracy out of these grammars so that we don’t end up with frustrated callers, but the plain truth is that we will never, ever be able to write a grammar that accepts alphabetical characters to be 100% accurate using todays recognition technology. What we will do today is twofold:

(1) Craft an SRGS+SISR subgrammar for alphabetical, and numeric characters

(2) Plug this grammar into a mixed-initiative form dialog that will minimize (but not fully address!), the possibility for mis-recognitions.

Those developers who have the need for such a grammar and dialog within their production-grade applications are advised to take this basic framework as a starting point, and then expand on it by:

(a) Test carefully with a broad range of users, and to fully flesh out alternate utterance values for alphabetic characters

(b) Apply item weighting to specific characters based on the probability of a given character versus another like-sounding character – this will depend greatly on the specific usage of the grammar

(c) Track results by using w3c-compliant utterance recording, and logging all shadow variables, so that these results can be used to further tune and tweak our grammar for maximum accuracy

(d) Consider using n-best post-processing as an additional confirmation step to ensure that the results we receive are indeed accurate

For today’s entry, lets assume that we need to track a three digit zip code, which are prevalent in Canadian locales. Our predefined format for utterance values are “Alpha Digit Alpha”, and luckily, not all alpha characters are applicable: Instead of trying to recognize 26 letters accurately, we only need to recognize 16, which helps a lot!

We won’t dig into the specifics of a mixed-initiative form dialog, as we have already done so in our mixed-initiatve tutorial, but the gist is that this feature of VoiceXML allows us to fill multiple fields with a single utterance, and breaking up each alpha and numeric character into it’s own recognition field greatly cuts down on disambiguation problems that can occur.

For the purposes of brevity, what we have below is a stripped-down version of our fully fleshed-out grammar, but you may download the full grammar, and the mixed-initiative dialog right here, which contains lots more inline notations.

<?xml version= "1.0"?><grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US">

<rule id="canadianZip" scope="public">

<one-of>

<!-- ALL THREE FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item><!-- ONLY TWO FIELDS FILLED -->

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<item>

<item>

<ruleref uri="#alphaRule1"/>

<tag>out.alphaSlot1=rules.alphaRule1.alphaSlot1;</tag>

</item>

<item>

<ruleref uri="#alphaRule2"/>

<tag>out.alphaSlot2=rules.alphaRule2.alphaSlot2;</tag>

</item>

</item>

<!-- ONLY ONE FIELD FILLED  -->

<item>

<ruleref uri="#alphaRule1"/>

<tag></tag>

</item>

<item>

<ruleref uri="#numRule"/>

<tag>out.numSlot=rules.numRule.numSlot;</tag>

</item>

</one-of>

</rule>

<rule id="alphaRule1" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ex</item>

<item> ax</item>

<item> x </item>

</one-of>

<tag>out.alphaSlot1="X"; </tag>

</item>

</one-of>

</rule>

<rule id="numRule" scope="public">

<one-of>

<item> one <tag>out.numSlot="1"; </tag>  </item>

</one-of>

</rule>

<rule id="alphaRule2" scope="public">

<one-of>

<item weight="1.0">

<one-of>

<item> ay</item>

</one-of>

<tag>out.alphaSlot2="A"; </tag>

</item>

</one-of>

</rule>

</grammar>

In brief, our top-level rule assumes that we can have any of the following entries:

"X1A""X"

"X1"

"XA"

"1"

"1A"

And in the event that we get one or two characters matched in our utterance, the VoiceXML mixed-initiative logic will then take over, and prompt the caller to fill in any “blanks” remaining.

A few things of note about the grammar defined below is that in the event that we receive only a single alpha utterance, we will assume that it is the first character, not the last. Additionally, when we construct a grammar that contains multiple slot returns, it is required that we explicitly define the slot values all the way up the chain: if we didn’t define the “out.[slotname]=rules.[rulename].[subslot]” within the context of the top-level rule, the last slot value would overwrite all others, meaning that we would only get a value for “alphaSlot2″ within the VoiceXML dialog. To illustrate even further, the below snippet for a top-level return would make this a reality:

<item> 
<ruleref uri="#alphaRule1"/>

<ruleref uri="#numRule"/>

<ruleref uri="#alphaRule2"/>

</item>

You’ll also see that each possibility for character recognition is specified within the top-level rule, so in the event that we get 1, 2 or 3 character strings, we can pipe the return value back to the VoiceXML, and let the mixed-initiative dialogs then access the sub-rules (alphaRule1/2 and numRule), individually as needed.

We also illustrated in brief how one can define multiple like-sounding utterance values that return the same interpretation value, and defined an for our alphaRule1 entry simply to show how this can be done: The task of taking this framework, and turning it into a grammar that satisfies any given project rests in the hands of you, the capable IVR developer.

=^)

Till next time,

Matthew Henry Director of Customer Support Voxeo Corporation

Useful Links

Technorati Tags: , , , , , , ,


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Certified Tech Tip: Using SISR-formatted grammar returns with Prophecy 8

Monday, January 28th, 2008

voicexmlcertifieddeveloper.gifI am happy to announce a new semi-regular addition to the Voxeo blog, where the Voxeo Support team will be adding VoiceXML, CallXML, and CCXML tips, tricks, and best practices for our developers, which we will christen as “Certified Tech Tips”. The name has a nice ring to it and all, but this isn’t just for show: 100% of the technical support team are certified VoiceXML developers, and we are pretty proud of being the only provider who holds these standards.

As we devise some really inventive means of achieving project goals & cool functionality when coding in the framework of these various IVR markups, we thought that we might share some of these tips to our readers of the Voxeo Blog.

For those who haven’t interacted with the support team yet, a bit of introduction is in order. My name is Matthew Henry, and serve as the Director of Customer Support here at Voxeo. I have been with the company since it’s inception (way back in the 20th Century), and have been lucky enough to work with a sizable number of really talented IVR developers and engineers, which has allowed me to learn a lot, and has also allowed me to build up a respectable code library for all things IVR. And now, it’s time for some payback.

=^)

As our maiden posting to the Voxeo blog, we will cover the topic of Semantic Interpretation for Speech Recognition-formatted grammar returns when using the Prophecy 8 software. A lot of folks are used to using plain-old Nuance GSL grammars due to it’s ease of use and concise markup, but the drawback of using this approach is pretty fundamental: As GSL is Nuance-specific, it isn’t guaranteed that every provider will support it. And those of us who have written complex grammars know that porting a grammar can be a tedious job to take on. For this reason, we always suggest that folks stick with a W3C standard when writing grammars, that being using the SRGS XML-based grammar format that leverages the SISR syntax to populate our grammar interpretations back to the VoiceXML dialog. Most of the documentation on our site references using the Nuance-specific return formatting, and today we will show you what a 100% w3c compliant grammar looks like.

To start things off, let’s take a look at some GSL, and some SRGS with Nuance-specific returns for the sake of comparison:

Simple GSL

MYRULENAME [
[utterance]      {<mySlotName  "my return value">}
]

Simple SRGS with Nuance-returns

<?xml version= "1.0"?>
    <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" 
        root = "MYRULENAME">
      <rule id="MYRULENAME">
        <one-of>
          <item> 
   	       utterance
	       <tag> <![CDATA[  <mySlotName "my return value"> ]]>  </tag> 
          </item>
        </one-of>
      </rule>
    </grammar>

Simple SRGS with SISR returns

<?xml version= "1.0"?>
    <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" 
          root = "MYRULENAME">
      <rule id="MYRULENAME">
        <one-of>
          <item>
               utterance
	       <tag>$.mySlotName = "my return value"</tag> 
	  </item>
        </one-of>
      </rule>
    </grammar>

The differences in syntax are fairly self-evident in these cases. In the case of SISR, the “$.” prefix allows us to specify any slotname that we will return to our VoiceXML dialog, and specifying a quoted interpretation value preceded by an ‘equals’ sign links the value to this slot.

In addition, we can also specify a “generic” return where no slotname is specified (which comes in handy for subgrammars) by putting $=”my return value” within the . If we want to get really fancy, we can even specify multiple slots to return back to the dialog by inserting a “;” delimiter between the slot/interpretation pairing. A sample multislot return with an “anonymous” slot also defined might look something like this:

<item> 
    utterance 
    <tag>
    $ = "my anonymous slot value"; 
    $.mySlotName1="my slot 1 return";	 $.mySlotName2="my slot 2 return";
    </tag> 
</item>

As you can see, the SISR returns are much more concise, easy to read, and much more lightweight than Nuance-specific returns. And once you write a grammar using SIRGS and SISR, then any Certified Compliant VoiceXML platform will run these grammars without any porting at all being required.

If you found this posting useful, then let us know! Mayhap we will dig deeper into this the next time, and whip out some more complex subgrammars to better illustrate the usage of SISR formatting within your IVR applications.

Till next time!
~Matthew Henry


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.