Archive for the ‘Conferences’ Category

Slides: Comparative ASR Evaluation (Dan Burnett, SpeechTEK NY 2010)

Friday, August 13th, 2010

Have you ever wanted to compare automatic speech recognition (ASR) engines?  That was the topic of a three-hour tutorial given by Voxeo’s Dan Burnett at SpeechTEK last week in New York. Dan gave a hands-on session as part of “SpeechTEK University” where students received an in-depth lesson in how to compare and evaluate ASR engines.  The session description was:

9:00 AM – 12:00 PM – STKU4 – Performing a Comparative ASR Evaluation- Dan Burnett, Director of Speech Technologies

There are a variety of choices for speech recognition engines, especially for languages other than English. Although many companies are comfortable performing business case analysis across speech recognition engine vendors, it is often difficult to know how to compare accuracy. In this handson, in-depth course you will learn how to select an appropriate evaluation data set, why transcription is important and how to do it properly, how to perform the evaluation, and how to analyze and interpret the results.

 

Dan’s slides are available for viewing and download:

Obviously you need some of the sample files to work through the lab exercises, but the rest of slides do cover what Dan discussed.


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Slides: Mashups: Integrate Multiple Web Services Into Your App

Friday, August 13th, 2010

At SpeechTEK last week in New York, Voxeo’s Dan Burnett was part of a panel session on the topic of “speech mashups.” The session description was:

10:15 AM – 11:00 AM - C101 – Mashups: Integrate Multiple Web Services into Your App - Dan Burnett, Director of Speech Technologies

Learn how speech technologies from recognition to transcription and beyond, now available in the cloud, are being accessed to create multimodal applications. Learn how new licensing models are emerging; novel partnerships are being formed; and the role of vendors, operators, and developers is being redefined. See demonstrations of some new service opportunities made possible with speech mashups. Discover how easy it is to create your own mashups that use speech and how they compare to what you can do with VoiceXML.

Dan talked about mashups and then demonstrated a mashup using Tropo and the Ruby programming language. His slides are available for viewing or download:


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Slides available for Voxeo CTO RJ Auburn’s JavaOne talk “Taking a SIP of Java”

Thursday, June 4th, 2009

This week Voxeo CTO RJ Auburn spoke out at the JavaOne conference on the topic of “Taking a SIP of Java“. RJ’s slides are now available on SlideShare at:

http://www.slideshare.net/voxeo/javaone-a-sip-of-java-rj-auburn

And the presentation, also embedded below, looks to be a classic RJ kind of talk… fun, lively, interesting… and also with some code. I don’t know if there were any recordings made, but if there were I’ll update the article with a link. Enjoy the talk!


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook or following us on Twitter.


Technorati Tags: , , , , ,


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Video from eComm: Voxeo CTO RJ Auburn on Building Voice Mashups using Tropo.com

Monday, May 18th, 2009

eComm organizer Lee Dryburgh recently made the video available of Voxeo CTO RJ Auburn’s talk at eComm in March titled “Building Voice Mashups using SIP Servlets” but which really focused on all the cool things you can do with Tropo.com (and Tropo is built on top of SIP Servlets). You can view it here – and the slides are embedded below the video:

RJ’s slides are available here:


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook or following us on Twitter.


Technorati Tags: , , , , ,


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Code Walk: Listening to Identi.ca (OSCON 2008 Demo #2: VoiceXML)

Tuesday, January 27th, 2009

Back in the summer of 2008, I gave a presentation at O’Reilly’s Open Source Convention (OSCON) about “Mashing Up Voice and the Web Using Open Source and XML” where I talked primarily about integrating voice with the Identi.ca microblogging service. While I made the slides from that talk available previously, I only made one of my mashup demos available here in this Voxeo Developer’s Corner blog (Demo #1: Is Twitter Down?) So I want to change that and start making some more of the demos available.

In my Demo #2, Listening to Identi.ca, I created a VoiceXML application that does the following:

  • Asks the caller if they want to hear:
    • the latest Identi.ca message from the people they follow
    • the latest reply to them
    • the latest public Identi.ca message
  • Uses speech recognition to interpret the result
  • Retrieves the requested information from Identi.ca
  • Relays the information to the caller

Now there is the caveat that this demo is hardcoded to a single identi.ca user (namely me – identi.ca/danyork). You can try it out yourself by calling any of these numbers:

If you would like to try out this code below yourself with your own Identi.ca account, all you need to do is create a free developer account on our Evolution developer portal, create the VoiceXML file and assign it a phone number. (Step-by-step instructions are available.) You also can download a free copy of our Prophecy software, install it on a local system and set up this code there.


With that, let’s jump into the code. The full VoiceXML source code is available down below as something you can copy and paste, but right now I’m going to walk through the pieces of the code.

First we have the standard start of a VoiceXML file and the definition of a variable that is going to be used to store the results of the request to Identi.ca:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
  <var name="MyData"/>

Next we start the <form> element and begin with defining a JavaScript function that is going to retrieve the text we want from the XML data sent back to us from Identi.ca:

  <form id="F1">
    <script>
      <![CDATA[
          function GetData(d,t,n)  {       
            return (d.getElementsByTagName(t).item(n).firstChild.data);
          } 
      ]]>
    </script>

I can’t claim credit for the JavaScript – it was something I found in one of the VoiceXML tutorials we have available. Basically it is searching the received XML for tags of type “t” and then going to the “n”th tag and retrieving the data from there.

Now we start with a field in the VXML form. Note that I use an audio file that I had previously recorded:

    <field name="Choice">
        <prompt bargein="false">
           <audio src="../audio/identicachoice.wav"/>
        </prompt>

I could have just as easily used Text-To-Speech (TTS) to do the same thing:

    <field name="Choice">
        <prompt bargein="false">
           Welcome to the listen to identi.ca demo. To hear your 
           latest message please say "friends". To hear the latest
           reply please say "replies". To hear the latest public 
           message please say "public".
        </prompt>

Now I define the “grammar” which is the list of the words that I will accept and that the speech recognition engine will listen for:

        <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="MYRULE">
         <rule id="MYRULE">
          <one-of>
            <item>friends</item>
            <item>replies</item>
            <item>public</item>
          </one-of>
         </rule>
        </grammar>

To finish off the field, I’m going to catch two error cases where either the caller said nothing or did not say one of the three words in the grammar:

        <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>
      <nomatch>
        I did not recognize that word.  Please try again.
        <reprompt/>
      </nomatch>
    </field>

With the <field> defined, I move on to define what happens once acceptable input has been received by using the <filled> element. Note that I am using the Choice name that was defined in the field above.

First I am going to check if the caller said “friends” and if so I am going to use the <data> element to make a web call out to the Identi.ca site. The results of the web call are stored in the MyData variable which is then referenced in the <prompt> element:

    <filled namelist="Choice">
     <if cond="Choice == 'friends'">
      <data name="MyData" src="http://identi.ca/danyork/all/rss?limit=1"/>
      <prompt>
        Your last notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>

Note that I am using the previously defined GetData JavaScript function to walk the XML tree twice: first to get the person sending the Identi.ca notice and second to get the contents of the notice. Now to make this work, I did have to look at the XML sent back by Identi.ca and figure out which were the appropriate tags and position numbers to use.

I next do the same thing for ‘replies’ and ‘public’:

     <elseif cond="Choice == 'replies'"/>
      <data name="MyData" src="http://identi.ca/danyork/replies/rss?limit=1"/>
      <prompt>
        Your last reply is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The reply is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>
     <elseif cond="Choice == 'public'"/>
      <data name="MyData" src="http://identi.ca/rss?limit=1"/>
      <prompt>
        The last public notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',1)"/>.
      </prompt>
     </if>

You will note that I explicitly tested for ‘public’ although I really didn’t need to do so. The grammar only allowed three options, so if it was not one of the first two it would naturally be ‘public’. I could have just used an <else/> here.

Finally I just thank the caller with a final prompt and end the various elements to close off the file:

      <prompt>
        That is all. Thank you for calling in.
      </prompt>
    </filled>
  </form>
</vxml> 

FULL SOURCE CODE

For those who want to see the entire source code or copy/paste the code, here it is. You’ll note that I have here the TTS version since you won’t have access to the audio file I made. If you’d like to use a recorded prompt, you can use the code I had above.

Obviously wherever you see “danyork“, you can substitute your Identi.ca user name or that of whomever you want to hear the messages from.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
  <var name="MyData"/>
  <form id="F1">
    <script>
      <![CDATA[
          function GetData(d,t,n)  {       
            return (d.getElementsByTagName(t).item(n).firstChild.data);
} 
      ]]>
    </script>
    <field name="Choice">
        <prompt bargein="false">
           Welcome to the listen to identi.ca demo. To hear your 
           latest message please say "friends". To hear the latest
           reply please say "replies". To hear the latest public 
           message please say "public".
        </prompt>
        <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="MYRULE">
         <rule id="MYRULE">
          <one-of>
            <item>friends</item>
            <item>replies</item>
            <item>public</item>
          </one-of>
         </rule>
        </grammar>
        <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>
      <nomatch>
        I did not recognize that word.  Please try again.
        <reprompt/>
      </nomatch>
    </field>
    <filled namelist="Choice">
     <if cond="Choice == 'friends'">
      <data name="MyData" src="http://identi.ca/danyork/all/rss?limit=1"/>
      <prompt>
        Your last notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>
     <elseif cond="Choice == 'replies'"/>
      <data name="MyData" src="http://identi.ca/danyork/replies/rss?limit=1"/>
      <prompt>
        Your last reply is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The reply is: <value expr="GetData(MyData,'title',2)"/>.
      </prompt>
     <elseif cond="Choice == 'public'"/>
      <data name="MyData" src="http://identi.ca/rss?limit=1"/>
      <prompt>
        The last public notice is from <value expr="GetData(MyData,'dc:creator',0)"/>.
        The notice is: <value expr="GetData(MyData,'title',1)"/>.
      </prompt>
     </if>
      <prompt>
        That is all. Thank you for calling in.
      </prompt>
    </filled>
  </form>
</vxml> 

I hope you found this tutorial useful and please feel free to leave your comments, suggestions or questions here. (Including if you think of a better way for me to write my VXML code.)

Meanwhile, as I said before, if you would like to try out this code below yourself with your own Identi.ca account, all you need to do is create a free developer account on our Evolution developer portal, create the VoiceXML file and assign it a phone number. (Step-by-step instructions are available.)

Also, if you extend this app and do something interesting with it (for instance, allowing the caller to choose between different Identi.ca accounts) and would be open to sharing what you’ve done, please feel free to email me. I’d love to post some follow-up posts that show what else you can do with VoiceXML and services like Identi.ca.

P.S. Because Identi.ca uses the same style of RESTful API as Twitter, this script can be modified to work with Twitter by simply changing the web call in the <data> element to be for the Twitter API. If I recall correctly, I also had to figure out what tag name and item number were necessary for the GetData function as the XML data returned was different between Identi.ca and Twitter.


If you found this post interesting or helpful, please consider either subscribing via RSS, following us on Twitter or following us on Identi.ca.


Technorati Tags: , , , , ,


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Code Walk: OSCON 2008 Demo #1: Is Twitter Down? (VXML & JavaScript)

Monday, September 29th, 2008

As I mentioned previously over on our Voxeo Talks blog, I’m going to walk through here in this blog several of the demonstrations that I did at the O’Reilly Open Source Convention (OSCON) back in July. The slide deck is embedded in the previous post, and I’m going to jump right to slide 48 where I began Demo #1: Is Twitter Down?

This was just a short little demo that was designed to show how VoiceXML can be used with embedded JavaScript/ECMAScript. It performs three basic steps:

  • Connects to www.istwitterdown.com

  • Uses JavaScript to parse the result
  • Relays result to caller using Text-To-Speech

If you’d like to hear it, you can actually call it – but do note the Big Caveat noted farther down in the article – at either of these numbers:

WALKING THE CODE

So let’s take a walk through the code. The first part is simply the initial VoiceXML starting statements, a declaration of a variable called “MyData” and the start of a form called “F1″:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
  <var name="MyData"/>
  <form id="F1">

The next part is a JavaScript function that uses the JavaScript “getElementsByTagName” function to walk the HTML of the page that is loaded in:

    <script>
      <![CDATA[
          function GetData(d,t,n)  {       
            return (d.getElementsByTagName(t).item(n).firstChild.data);
          } 
      ]]>
    </script>

Now I can’t claim credit for this code – I really just modified the example provided in the VoiceXML documentation for the <data> element. The way to think about it is this – the system loads the entire document requested by the <data> element into memory as an XML tree (even though it is in fact HTML). The ‘getElementsByTagName’ function then walks the XML tree to find the tag requested.

Let’s look at how I called the function to see how this makes sense:

    <block>
      <data name="MyData" src="http://www.istwitterdown.com/"/>
      <assign name="status" expr="GetData(MyData,'a',0)"/>

Now the first line uses the <data> command in VoiceXML to retrieve the contents of the web page at www.istwitterdown.com and assign that to a variable called “MyData”.

The second line calls the JavaScript function above with the “MyData” variable and says that I am interested in all the <a> tags and specifically the first <a> tag (numbered “0″). Now to figure out what tag I wanted, I didn’t honestly do anything overly brilliant. I just went to www.istwitterdown.com, did a “View Page Source” in my browser and then looked for where the word “No” appeared in the body text. Now this site happens to have a very simple page structure and so finding the node to use was trivial:

<body>
<h1><a href="link removed">No</a></h1>

So the site had the content I wanted (“No” or “Yes”) right in the very first <a> tag. Filling in the contents of the getElementsByTagName call with the variables sent to the JavaScript function, it looked like this:

d.getElementsByTagName('a').item(0).firstChild.data

Which translates to wanting the “data” of the first <a> tag in the document. In this case, “No” or “Yes”.

This data is then assigned to the VoiceXML variable “status”.

The rest of the VoiceXML app is then simply branching on what the “status” variable is. If it is “No”, then Twitter is up. (A bit of a double negative kind of thing going on.) If it is anything else, the site is down:

      <if cond="status=='No'">
          <prompt>Twitter is currently up.  Yea!</prompt>
      <else/>
          <prompt>Twitter is currently down. </prompt>
      </if>
    </block> 
   </form>
</vxml> 

I then close off the block, the form and the VXML file. Note that I could have checked to see if status=='Yes' but I just made the assumption that if it was not “No” then there was a problem with Twitter.

THE BIG CAVEAT

Which does bring me to the big caveat of this application – because I was relying on some other web site (istwitterdown.com in this case) I made a big assumption that the external site would work!

It didn’t.

When I was out at OSCON, I was delighted by the fact that at one point Twitter was actually down! However, throughout all the time that Twitter was down and despite multiple browser hard refreshes, the www.istwitterdown.com site always said “No”, meaning Twitter was up.

Oops.

Now, the site did work in the past. I know because I used it successfully when Twitter was going up and down earlier this year. However, sometime along the way whatever logic the folks at the site were using to detect that Twitter was down seemed to stop working. Or, at least it stopped working on the day that I was testing it. Maybe it’s working now… I don’t know… the good news is that Twitter hasn’t been down much at all in recent weeks.

Anyway, it’s a good lesson that if you rely on external sites, you do need to ensure that the external site is in fact giving you the correct data that you want.

WRAPPING UP

I hope this little example gave you a useful glimpse into how VoiceXML’s <data> tag can be used to pull in web pages and how JavaScript can be used to parse those web pages. More information can be found on the VoiceXML documentation page for the data element. I would encourage you to read both the main documentation page as well as the comments to that page.

In my next blog posts walking through my subsequent OSCON 2008 demos, I’ll explore what else can be done with the data element and other tools. Stay tuned…


THE WHOLE CODE

For those of you who want to do a straight copy-and-paste to play with the code, here it is in its entirety:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
  <var name="MyData"/>
  <form id="F1">
    <script>
      <![CDATA[
          function GetData(d,t,n)  {       
            return (d.getElementsByTagName(t).item(n).firstChild.data);
          } 
      ]]>
    </script>
    <block>
      <data name="MyData" src="http://www.istwitterdown.com/"/>
      <assign name="status" expr="GetData(MyData,'a',0)"/>
      <if cond="status=='No'">
          <prompt>Twitter is currently up.  Yea!</prompt>
      <else/>
          <prompt>Twitter is currently down. </prompt>
      </if>
    </block> 
   </form>
</vxml> 

Technorati Tags: , , , , ,


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


OSCON – Greetings… and voice mashups with identi.ca

Wednesday, July 23rd, 2008

Greetings from the floor of O’Reilly’s Open Source Convention (OSCON) in Portland, ME… {seesmic_video:{“url_thumbnail”:{“value”:”http://t.seesmic.com/thumbnail/3aATDeXXb3_th1.jpg”}”title”:{“value”:”OSCON – Greetings… and a teaser about my talk on voice mashing up with identi.ca ”}”videoUri”:{“value”:”http://www.seesmic.com/video/zQOEEEUNAE”}}}

As I said in the video, I’ll be speaking today at 5:20pm US Pacific time on the topic of “Mashing up Voice and the Web using Open Standards and XML” where I’ll be showing how you can use VoiceXML and CCXML to do interesting connections to “web 2.0″ sites. Needing a site to use for my examples, I decided to this round of demos with identi.ca, the new open source microblogging site (like Twitter, only open source).

Once my talk is over, I’ll post the slide deck here in this blog and also the code to the demos I’ve written. I’m looking forward to it… it should be fun! (If you’re here at OSCON, I will apparently be in room D137 at 5:20pm.)

P.S. You can also follow me on identi.ca at identi.ca/danyork


Want to learn how Voxeo can help unlock your communications and deliver a better customer experience? Please contact us!

If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.