Archive for October, 2008

RFC 5359 provides VERY detailed SIP service examples

Monday, October 27th, 2008

Have you ever wished you could see examples of exactly how the call flows are supposed to go between endpoints in a SIP communication session? Down to the individual SIP packets sent back and forth?

Well, now you can courtesy of the newly-issued RFC 5359, SIP Service Examples. The RFC goes through a whole series of call flows in a level of detail all the way down to the packet level. Here’s the abstract:

This document gives examples of Session Initiation Protocol (SIP) services. This covers most features offered in so-called IP Centrex offerings from local exchange carriers and PBX (Private Branch Exchange) features. Most of the services shown in this document are implemented in the SIP user agents, although some require the assistance of a SIP proxy. Some require some extensions to SIP including the REFER, SUBSCRIBE, and NOTIFY methods and the Replaces and Join header fields. These features are not intended to be an exhaustive set, but rather show implementations of common features likely to be implemented on SIP IP telephones in a business environment.

And here is the list from the table of contents of the specific examples:

   2. Service Examples ................................................6
      2.1. Call Hold ..................................................6
      2.2. Consultation Hold .........................................19
      2.3. Music on Hold .............................................38
      2.4. Transfer - Unattended .....................................50
      2.5. Transfer - Attended .......................................58
      2.6. Transfer - Instant Messaging ..............................71
      2.7. Call Forwarding Unconditional .............................77
      2.8. Call Forwarding - Busy ....................................84
      2.9. Call Forwarding - No Answer ...............................92
      2.10. 3-Way Conference - Third Party Is Added .................101
      2.11. 3-Way Conference - Third Party Joins ....................107
      2.12. Find-Me .................................................113
      2.13. Call Management (Incoming Call Screening) ...............125
      2.14. Call Management (Outgoing Call Screening) ...............132
      2.15. Call Park ...............................................135
      2.16. Call Pickup .............................................147
      2.17. Automatic Redial ........................................154
      2.18. Click to Dial ...........................................163

Obviously there are many more call flows out there, but these are a sample that can give you a real sense of how the SIP communication is supposed to go. Great to see this out there for all the people new to SIP.

Technorati Tags: , , ,


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


PLS – what is it, what “gap” does it fill?

Friday, October 17th, 2008

I have often been asked what PLS is, why it exists, etc., so I thought it would be worth reviewing the history of why web-based standards are a good thing for the voice/speech industry and then go into PLS and how it fits.

 

Why are today’s voice standards web-based?

W3C began creating Voice markup languages in 1999 when it began work on what would eventually lead to VoiceXML 2.0 and 2.1. The two groups that were interested in this were W3C and the speech recognition and synthesis industry.

W3C was interested because of a strong desire for the content of the Web to be accessible to “anyone, anywhere, anytime, using any device” (see W3C’s Ubiquitous Web Domain). Because of the broad success of HTML as a language for creating visual user interfaces, it seemed logical to extend that notion to the creation of auditory user interfaces (voice interfaces) that would work well with the various languages developed by W3C for representing content.

The speech recognition and synthesis industry (voice industry) was also interested in standardization. Before VoiceXML and its related languages, including PLS, each vendor of speech recognition and synthesis technology had its own proprietary interface for controlling the recognizer or synthesizer. This slowed overall adoption of voice technologies because a) authors of voice applications had to learn multiple APIs and b) the differences from one API to another made it difficult to switch vendors.

One of the most amazing benefits from the creation of VoiceXML and its related markup languages was the introduction of the web model of programming. Just like with HTML and the World Wide Web, application files could be distributed around the world. Just like with HTML, where there is a visual browser that runs on your desktop computer that converts the HTML into text for you to read and buttons for you to click, for VoiceXML there is a voice browser that turns VoiceXML pages into spoken text and something that listens to what you say. The primary implementation difference is that the voice browser lives in a computer network rather than on your desktop, and it is accessed via the phone. Because of this XML-based language (VoiceXML) and the web development model, companies adopting voice technology could now make use of their existing web infrastructure for document caching, integration with business logic and back-end databases, and server reliability and availability, not to mention the growing number of programmers familiar with the web programming model and markup languages such as XML and HTML.

 

Where does PLS fit?

Let’s start from the top down. VoiceXML is a markup language for developing voice applications. VoiceXML makes use of speech recognition and speech synthesis.
Before going further, I need to briefly explain how a speech recognizer works:

The speech recognizer makes use of a grammar, a lexicon (or dictionary), and acoustic models. The grammar is a file that lists what words to listen for, in what order — for example, “I am flying from Boston”.
The lexicon (or dictionary) is a file that describes how each legal word is pronounced – that’s how it knows that “B o s t o n” is pronounced “Boston” and not “Poughkeepsie”.
The acoustic models describe the mapping between pronunciation symbols and the actual sounds that we hear — one model for “ae”, one for “k”, one for “uh”, and so on.
So when a speech recognizer listens to someone speaking, it uses all three of these pieces of information to convert the sounds the person makes into a set of pronunciation symbols, from those symbols to words, and from the words to a sentence. While acoustic models are a closely guarded secret that differentiates one speech recognition vendor from another, the other two pieces of information are a bit easier to standardize.

W3C already has a standard for specifying grammars, called the Speech Recognition Grammar Specification, or SRGS. The new Pronunciation Lexicon Specification “fills the gap” by providing a standard way to create pronunciation dictionaries.

 

Why were pronunciation dictionaries non-standard?

I alluded to this above. Since acoustic models were (and still are) private, before there was a standard way to specify grammars each speech recognition vendor had its own dictionary format, its own language for specifying how words were be pronounced. Often vendors used different pronunciation symbol sets, since each vendor’s symbol set was designed to match its private set of acoustic models. For example, the vowel sound in “cat” could be represented using the symbol “ae” or “aaa”, or anything else a vendor wanted.

There’s another reason too.
Speech synthesizers use pronunciation lexicons as well, but with slightly different formats. In brief, here’s how a synthesizer works:

A speech synthesizer converts written text into sounds to be spoken. To do this it uses an SSML document and one or more dictionaries (lexicons). The Speech Synthesis Markup Language (SSML) is a language that allows an author to change how text is spoken — for example, by marking some text as sentences and some as paragraphs, by telling the synthesizer when to change voices, or even by telling the synthesizer exactly how to pronounce a certain word. The lexicon documents used by a synthesizer, just like for a speech recognizer, describe how words are to be pronounced.

So there were these two reasons for differing pronunciation dictionaries (lexicons): different vendors used different pronunciation symbols, and recognizers and synthesizers used slightly different formats.

 

How does PLS help?

Above I described why there were differences in the pronunciation formats before PLS. What PLS did is this:

  • First, it provided a single, standard XML-based language for describing pronunciations, both for speech recognizers and for speech synthesizers.
  • Second, it requires support for IPA, the International Phonetic Alphabet. This Alphabet is a standard symbol set for representing pronunciations of all the languages of the world.

With PLS it is now possible to write one lexicon document that can be used by any speech recognizer and/or any speech synthesizer that supports it. One document for all of your pronunciations, independent of your voice technology vendor.

Although it’s still new at this point, I believe this specification will be widely supported in a couple of years.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Pronunciation Lexicon Specification reaches Recommendation Status

Thursday, October 16th, 2008

On Tuesday W3C released the Recommendation for the Pronunciation Lexicon Specification (PLS). “Recommendation” is the final step in the W3C standards process.

This specification defines a new markup language that is used to represent pronunciation dictionaries.
In it, written words would have one or more pronunciations defined for them. An SSML document could reference a PLS document to indicate how certain words should be pronounced.
An SRGS (W3C grammar format) document could reference a PLS document to indicate what pronunciations to listen for to match certain words.

Here is an example:

<lexicon version="1.0"  alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>judgment</grapheme>
    <grapheme>judgement</grapheme>
    <phoneme>ˈdʒʌdʒ.mənt</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>fiancé</grapheme>
    <grapheme>fiance</grapheme>
    <phoneme>fiˈɒns.eɪ</phoneme>
    <phoneme>ˌfiː.ɑːnˈseɪ</phoneme>
  </lexeme>
</lexicon>

In this example there are two spellings for judgement and two for fiance. For each word there is a pronunciation (in <phoneme>) written in the International Phonetic Alphabet.

For more info on the specification, see the press release.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


IETF dives into “Green” with RECIPE (Reducing Energy Consumption with Internet Protocols Exploration)

Wednesday, October 15th, 2008

ietflogo-2.jpgOver in the Internet Engineering Task Force (IETF), Henning Schulzrinne has dived into the whole area of “green”/environmental issues around energy with the creation of a new discussion list called RECIPE (Reducing Energy Consumption with Internet Protocols Exploration). Henning describes the goals and purpose in his message announcing the list:

Based on some very preliminary discussions in Dublin, I’ve set up a new discussion list to talk about the intersection of Internet protocols and energy management. The goal is NOT how to make protocols, routers or servers more energy-efficient, but rather how to use Internet (application) protocols to better manage energy consumers and (local) producers. There has been a fair amount of work in this area, but mostly focused on lower layers, such as ZigBee. The initial goal of the discussion is to identify whether there is a need for work here or not. I’m also in discussion with a major local utility.

The discussion will take place at recipe@ietf.org, with subscription details at http://www.ietf.org/mailman/listinfo/recipe

A bit more detail is below:

In the next few years, the demands on the electric grid will change substantially. New power sources, such as wind and solar, delivery varying amounts of power based on the time of day, while new consumers, such as plug-in hybrids, impose additional demands. Local generators, such as small-scale solar and wind turbines, can produce additional energy. Grid control can better match energy supply and demand, and flatten peak usage by deferring non-time-critical demands to low-usage times. For example, an office building can use low-cost off-peak energy to produce ice, which is then melted during the day to provide air conditioning. In the home, dishwashers and washing machines can defer their operation. There has even been discussion of using plug-in hybrids as energy storage devices that charge their batteries at night and release energy to the power grid during the peak usage periods.

In addition, end users need to be able to determine easily what devices are consuming how much energy. For example, energy monitoring may alert a homeowner that a hot water pipe is leaking or that an AC air vent has been disconnected.

All of these new usages demand a much smarter grid that interacts with power consumers and producers at the edges of the grid. With near-universal broadband and wireless data network deployments, this is becoming quite feasible. Given the diversity of consumer and industrial products that need to be controlled, we need standardized, light-weight protocols that can provide core control functions.

More information and specific ideas can be found in Henning’s message. It’s a commendable effort and it will be interesting to see what comes out of the work. Like all IETF efforts, the RECIPE list is open to all those who are interested – simply join the mailing list to participate.

Technorati Tags: , , , ,


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


How simple is SIMPLE?

Tuesday, October 14th, 2008

SIMPLE standards for SIP for Instance Messaging and Presence Leveraging Extensions, which is a set of standards developed by IETF to support IM and Presence via SIP.

SIMPLE brings simple concepts for

  • How to subscribe, publish and notify presence information with Session Initiation Protocol (SIP)
  • How to do instant messaging in page and session mode with Session Initiation Protocol (SIP)
  • How to describe different presence information with extensible Presence Information Data Format (PIDF)
  • How to manage advanced presence and configuration functions such as resource list with XML Configuration Access Protocol (XCAP)

SIPoint Server by Voxeo is a SIMPLE compliant Presence and XCAP server with built-in SIP Registrar and Proxy. Try it out to make your presence solution simpler.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Speaking of JSRs

Wednesday, October 8th, 2008

Micromethod acquisition brings the Voxeo developer community a new set of programming interfaces – Java based APIs for call controls and potentially media controls.

These APIs are based on specifications defined by Java Community Process (JCP). JCP is a process for the Java community to develop specificiations for different Java technologies, such as APIs, languages, virtual machines, etc.

A specification is typically started with a Java Specification Request (JSR) by one or more JCP members. Once accepted by JCP Executive Committee, the JSR is assigned with a number, such as JSR 289. The JSR will be developed within the community in the following phases.

JSR timeline

Here is a list of all JSRs that have been developed so far.

SIPMethod Application Server is a JSR-116 (SIP Servlet 1.0) and JSR-289 (SIP Servlet 1.1) based SIP application server. I will talk more about how to develop SIP Servlet based applications in the future blogs.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.