Upcoming Jam Session on Voice Apps for Web Developers

March 15th, 2010 by Sabine Winterkamp

We would like to invite you to our upcoming Jam Session, which is scheduled for Thursday, March 18, 2010.

Adam Kalsey will show you how to use web technologies and tools to create user interfaces that operate outside of the browser window. You will learn what channels you can tie into and how to, what’s involved in adding voice and real-time text to new or existing Web applications.

Join us for this session on March 18, 2010.

It is scheduled for 11:00 Am Eastern, 08:00 AM Western, 4:00 PM Central European time


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Weekend link dump

February 14th, 2010 by Adam Kalsey

Time to clear my bookmarks out. Here’s some stuff you might enjoy.

  • Mark Headd has been playing with CouchDB shows how to install it and use it with Tropo. Mark’s detailed instructions on installing CouchDB should be able to help you quickly get this up and running. Not familiar with CouchDB? It’s a database that is accessed over RESTful HTTP and stores its data as JSON.
  • Mark also wrote up how to use user agent sniffing to deploy both text and voice apps on Prophecy showing VoiceXML developers how to easily add SMS and IM to their applications.
  • Thanks to Silicon Valet, you can manage your Google Calendar from your phone, powered by Tropo. Hear your appointments read to you and create voice notes for later. The creator, Ted Gilchrist is also working on Talk-o-Gram, a platform for exchanging short voice notes with your Gmail contacts.
  • Dominique Boucher looks at the rise of API-oriented voice services (including Tropo) in Back to Basics
  • Ian Mercer has created the ultimate home automation platform. Detects when someone’s in a part of the house and appropriately adjusts lighting, heat, and more. A web-based management console shows a log of the house activities and allows you to manage the house from your browser. You have to watch the demo to believe it (it uses Silverlight and takes a while to load, but it’s worth the wait.) But what if he forgets to turn off the oven when he leaves the house? He can call it and turn it off. Every appliance is connected to the phone.
  • Aslam Bari created the SalesForce Pinger to allow your application to send instant messages to your SalesForce users when certain events are triggered. Alert everyone that a sale just closed or that there’s a new lead available. Install the Pinger package from SalesForce and then add some simple code to your triggers. BotService.sendMessage('We won this opportunity');


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Using Biometrics in your Voice Applications

February 10th, 2010 by Adam Kalsey

How do you authenticate your callers before giving them access to confidential information? What if your application could recognize the caller’s voice? Voxeo has partnered with four of the leading voice biometrics suppliers to make implementing this technology in your VoiceXML application easier.

We’ve written a how-to guide for each biometrics vendor, showing the steps required to get set up with their platform and put together sample code for integrating biometrics into your VoiceXML app.

The intent of these guides and the trial accounts that our partners are offering is to introudce developers to voice biometrics on Voxeo’s platforms and demonstrate voice verification services with each vendor.

With each vendor, the general process is to apply for a developer or trial account and then use your account information in the sample VoiceXML applications that we’ll give you. You’re welcome to explore the documentation from each vendor to create more complex cases and to try biometrics in your own applications.

There’s two steps that your application will need to perform: enrollment and verification. Enrollment sets up a user in the biometrics platform and stores their voiceprint for future identification. Verification is the step performed when you want to check a caller against a previously-stored voiceprint.

The sample application we provide here is a simple use case. A caller calls in and our application uses their caller ID as the account number. We’ll start enrollment, and if the caller is successfully enrolled, we’ll start verification against this new voiceprint, asking for their password.  Obviously your real biometrics application can be much more complex. For instance, normally you would store the enrollment status of the caller and only start enrollment if they hadn’t previously enrolled. But this simple demo application should give you an idea of the basic steps required to add a similar biometrics feature from each vendor.

Each of the included examples uses a similar process for connecting your application to the biometrics service. The call is processed by your VoiceXML application and you either send data to a remote server using the <data> element or you transfer control of the call to a subdialog hosted on the remote server.

For the some vendors, your caller’s voice is recorded on your server and then the voice file is transmitted to the biometrics server. It makes decisions and passes the results back to you, allowing you to notify your caller. All interaction passes through your application.

biometrics-passthrough

Other vendors use subdialogs to record and process the voiceprints with your caller’s voice transmitted directly to the biometrics server. You choose when to hand off control of the call and the biometrics server gives it back to you when it’s done.

biometrics-subdialog

The end result is the same, however, and your caller won’t notice the difference.

To try biometrics in your application, visit the Voxeo Biometrics page for details on each vendor and a brief guide on how to get started.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Tropo Web API Jam Session

January 21st, 2010 by Adam Kalsey

Join us for our next technical jam session.

Topic: Tropo Web API – Leveraging the cloud to add real-time communications to your Web Apps

Date: February 24, 2010

Time: 8am Pacific, 11am Eastern, 5pm European

Speakers: Jason Goecke, VP of Innovation, Voxeo Labs

Jose de Castro, Chief Architect, Voxeo Labs

Abstract:

In this developer jam session, we will present the newly released Tropo Web API. Tropo makes it easy for you to quickly add voice, instant messaging (IM), and SMS to your applications, using the programming languages and tools you already know using a web services API and JSON.

We will cover how the API works as well as provide examples of how to use this in your communications with robust speech recognition, text to speech, transcription, conferencing, instant messaging, and SMS to your applications.

Join us for this Jam Session on February 24, 2010

register-now


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Processing Input (VoiceXML for Web Developers)

January 4th, 2010 by Adam Kalsey

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Today I’m continuing the development of our application for the fictional Strato Pizza. Previously, I asked the caller for their pizza topping preference and their phone number, using both speech recognition and touch tone input. Today I’m going to do something with that input, and repeat the order to the customer.

Within VoiceXML, I can access the values of any fields with <value expr="fieldName$.utterance"/>. This code will return the matched value from my grammar.

Since I want to simply repeat the order and the phone number, I’m going to add a <block> element to my existing form. Inside the block, I’ll add a <prompt> element with the text I want to speak.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza.
      </prompt>
    </block>

When the VoiceXML browser reaches this line, it will speak my text, substituting whatever the caller said in response to the field named topping for topping$.utterance. If the caller asked for ham, the spoken text will be just like if my prompt said, “You ordered ham on your pizza.”

You can use multiple value expressions in a single prompt. I also want to tell the customer that they’ll get a call if there’s a problem with their order. I’ll repeat their phone number to them. Then I’ll thank them for their order and hang up.

    <block>
      <prompt>
        You ordered <value expr="topping$.utterance"/> on your pizza. If we have any questions we will call you at <value expr="phone$.utterance"/>. Thank you for your order.
      </prompt>
    </block>

Remember that for the phone number field, I allowed the caller to use either voice or touch tone input with a built in grammar like so:

    <field name="phone" type="phone">
      Please say or enter your phone number.
    </field>

When I access this value with <value expr="phone$.utterance"/> it doesn’t matter if the caller used voice or DTMF input. The grammar gives the same result. So when I read back the phone number, they’ll hear the digits of their phone number spoken back to them.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Collecting touch tone input (VoiceXML for Web Developers)

December 22nd, 2009 by Adam Kalsey

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Yesterday, I added the ability for my fictional Strato Pizza order taking application to ask the user what topping they’d like on their pizza. Now I need to ask them for a phone number, in case Strato is out of a topping and needs to call them.

When putting in a phone number, a lot of callers are comfortable with punching in their number on their phone keypads, while others would prefer to simply speak their number. I want my application to behave in the way that’s most comfortable for the caller, so I’m going to handle both methods of input.

First I create my field and validation code:

  <field name="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

I’m doing something a little different with the UI here when someone doesn’t enter or say anything. Instead of giving an error message and replaying the prompt, I’m simply replaying the prompt. In the case of a phone number where we’re accepting DTMF and voice input, saying “I didn’t hear that” seems a little silly. Just asking for the caller’s phone number a second time should suffice.

For a grammar, I could create a grammar consisting of every digit…

<grammar type="text/gsl">
  [one two three four five six seven eight nine zero]
</grammar>

… and to make it work with touch-tone input, add a grammar for DTMF digits …

<grammar type="text/gsl">
  [dtmf-1 dtmf-2 dtmf-3 dtmf-4 dtmf-5 dtmf-6 dtmf-6 dtmf-7 dtmf-8 dtmf-9 dtmf-0]
<grammar>

… but that will only accept a single digit. Now what? I could try to create a grammar that captures every possible combination of digits. For a ten digit phone number, that means I’d have a grammar with ten billion words in it. That doesn’t sound very practical. Or I could ask the user for every digit of their phone number, one digit at a time. Hardly usable. The easiest way to accomplish this is to use a special built-in grammar provided by VoiceXML that accepts a group of digits.

To use this built-in grammar, I simply add a type attribute to my <field> element and tell it the field is intended to hold digits.

<field name="phone" type="digits">

Now the caller can say or key in any number of digits. Since this is a phone number, I don’t want the caller telling me his phone number is “six” so I want to add some restrictions to that. Strato is in the United States, so the caller should enter at least 7 digits and no more than 10.

<field name="phone" type="digits?minlength=7;maxlength=10">

But what if the caller has an extension number to add? I could ask them a separate question to find out if they have an extension. Or I could use a different built-in grammar, one actually designed for phone numbers that already recognizes any 10 digit phone number, including extensions.

<field name="phone" type="phone">

You can see a list of all built-in grammars and different ways of including them in the Built-In Grammar Types VoiceXML documentation.

Because I’m using a built-in grammar for the phone number, I don’t need an additional grammar here. This means my complete field definition looks like this:

  <field name="phone" type="phone">
    Please say or enter your phone number.

    <noinput>
      <reprompt>
    </reprompt>

    <nomatch>
      I didn't understand that. Please try again.
      <reprompt>
    </reprompt>

  </nomatch>

This XML snippet will be put into my existing form element, right after the toppings field definition.

You can get the code for this example and all other examples from Voxeo’s GitHub account. At GitHub, you can fork or download the VoiceXML application thus far.

Next up, I’ll take the user’s input and do something with it.

Reblog this post [with Zemanta]


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for Web Developers: Collecting Input

December 21st, 2009 by Adam Kalsey

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you want to follow along with these examples, you should create a free VoiceXML hosting account in Evolution. Complete instructions were in the first installment of the series.

Last time out, I createda simple Hello World VoiceXML app that simply answers an incoming call and speaks some text. Now what if we want to add some interactivity and let the caller talk to the application?

Unlike some of the telephony services out there, Voxeo performs speech recognition. Our engine allows someone to punch buttons on their touch tone keypad (known as DTMF, for Dual Tone Multi-Frequency) or to speak to the application using natural language. Why ask your customers to listen to a menu of pizza toppings and remember which number to press when you can just let them say the names of the toppings?

Throughout this series, I’m building an application for Strato Pizza, a fictional pizza chain. In this installment, I’ll ask the caller which topping they’d like. For now, I’m only letting them order a one-topping pizza. Then I’m going to hang up.

The first step in adding either voice recognition or DTMF input is to add an input field to your document. In HTML if you want your user to give you information you use input tags inside a form tag. In VoiceXML you use <field> elements inside a <form> element. Fields have names and just like HTML, you can use those field names to get the values input by the caller. The field name must be a valid JavaScript variable name (so no spaces or dots in the name), and cannot start with an underscore (“_”) or end in a dollar sign (“$”).

Here’s what my form field looks like for asking the caller for their list of pizza toppings.

<form>
  <field name="topping">
    What topping would you like on your pizza?
  </field>
</form>

In my first application, I used <prompt> to speak the text and had to put that prompt element inside a <block> element. Here, I don’t need a block element, because form fields can live directly inside forms. I also don’t need to use prompt – the contents of my field will be spoken to the user and then the application will wait for their response.

For speech recognition to work, I need to provide a list of what the caller is going to say using a grammar. These grammars allow the speech recognition engine to pick out what the user said. Essentially I’m training the recognition engine.

A grammar can have a list of single words, can allow compound words (like “extra cheese”), and can even have synonyms so it understands that Ham and Canadian Bacon are the same thing.

Grammars go inside the body of a <grammar> element. Because you might be using reserved XML characters in your grammar, it’s a good idea to place this inside a CDATA section. The attribute type specifies the MIME type of the grammar file and is required. Grammar file? That sounds like I can use an external file for my grammars. I’ll look into external files in a later installment of this series. For now, I’m using an inline grammar with a type of text/gsl.

<grammar type="text/gsl">
  <![CDATA[
    ;Lines starting with a semicolon are comments.
    ;Match one of the enclosed terms
    [
      ;Terms are separated by a space
      pepperoni olives sausage anchovies

      ;They can also be on separate lines.
      ; Each line is recognized as a separate term
      onions
      peppers

      ;Parentheses require all of the enclosed terms
      ;to be matched. A logical AND
      (extra cheese) (roasted garlic)

      ;Square brackets are the same as OR
      [mushrooms portobello]

      ;You can mix AND &amp; OR together
      [ham (canadian bacon)]
    ]

  ]]>
</grammar>

This grammar applies only to the pizza toppings field, so I’m putting the grammar element inside the “topping” field. There’s other places it can go, but I’ll show those in a later installment. Putting these together, you get:

<form>
  <field name="topping">
    What topping would you like on your pizza?

    <grammar type="text/gsl">
      <![CDATA[
        ;Lines starting with a semicolon are comments.
        ;Match one of the enclosed terms
        [
          ;Terms are separated by a space
          pepperoni olives sausage anchovies

          ;They can also be on separate lines.
          ; Each line is recognized as a separate term
          onions
          peppers

          ;Parentheses require all of the enclosed terms
          ;to be matched. A logical AND
          (extra cheese) (roasted garlic)

          ;Square brackets are the same as OR
          [mushrooms portobello]

          ;You can mix AND & OR together
          [ham (canadian bacon)]
        ]
      ]]>
    </grammar>
  </field>

</form>

Now when someone calls, they can speak their topping and the application will understand it – as long as their topping fits within the grammar I’ve defined. There’s single word toppings like “pepperoni” and “onions” as well as multiple word toppings like “extra cheese.” Because I’ve put parentheses around “extra cheese” the recognizer won’t match if the caller says simply “cheese”. Callers have a tendency to say things you might not expect, like asking for “canadian bacon” instead of just “ham”, so the grammar can handle synonym terms as well.

What if a caller asks for a topping that Strato Pizza doesn’t offer? If Barbara calls up Strato and asks for her favorite potato pizza, my application should now what to do with her request.

On a web form, you generally perform some validation on your form submissions to make sure the user said what you expected them to say. In VoiceXML, I can use the <nomatch> element as a trigger for the caller saying something that doesn’t match the grammar I supplied. Inside the nomatch element, I add a <reprompt/> element to replay the question.

<!-- The caller said something that was not defined in our grammar -->
<nomatch>
  I did not recognize that topping. Please try again.
  <reprompt/>
</nomatch>

In a voice application, I have another type of validation to perform. One that doesn’t happen on the web. In a web form, I can present the user with a form and wait all day for them to fill it out and hit the submit button. But in a voice application, after I ask the caller a question, if they don’t respond, I probably want to ask them again. For this, I can use the <noinput> element to determine what to do when a caller is silent in response to a question. In my noinput I’m going to ask the question again using the reprompt element.

<!-- The caller was silent, restart the field -->
<noinput>
  I did not hear anything.  Please try again.
  <reprompt/>
</noinput>

These two validation elements go inside the form field, just like my grammar did. So now my field looks like this:

<form>
  <field name="topping">
    What topping would you like on your pizza?

    <grammar type="text/gsl">
      <![CDATA[
        ;Lines starting with a semicolon are comments.
        ;Match one of the enclosed terms
        [
          ;Terms are separated by a space
          pepperoni olives sausage anchovies

          ;They can also be on separate lines.
          ; Each line is recognized as a separate term
          onions
          peppers

          ;Parentheses require all of the enclosed terms
          ;to be matched. A logical AND
          (extra cheese) (roasted garlic)

          ;Square brackets are the same as OR
          [mushrooms portobello]

          ;You can mix AND & OR together
          [ham (canadian bacon)]
        ]
      ]]>
    </grammar>
    <!-- The caller was silent, restart the field -->
    <noinput>
      I did not hear anything.  Please try again.
      <reprompt/>
    </noinput>

    <!-- The caller said something that was not defined in our grammar -->
    <nomatch>
      I did not recognize that topping. Please try again.
      <reprompt/>
    </nomatch>
  </field>
</form>

Now my application is able to find out what sort of pizza a caller would like and can handle mistakes, distracted callers, and toppings I don’t have. Adding this to my greeting from the last post, I have:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

<form>
    <block>
    <prompt>
      Thanks for calling Strato Pizza.
    </prompt>
    </block>

    <field name="topping">
      What topping would you like on your pizza?

      <grammar type="text/gsl">
        <![CDATA[
          ;Lines starting with a semicolon are comments.
          ;Match one of the enclosed terms
          [
            ;Terms are separated by a space
            pepperoni olives sausage anchovies

            ;They can also be on separate lines.
            ; Each line is recognized as a separate term
            onions
            peppers

            ;Parentheses require all of the enclosed terms
            ;to be matched. A logical AND
            (extra cheese) (roasted garlic)

            ;Square brackets are the same as OR
            [mushrooms portobello]

            ;You can mix AND & OR together
            [ham (canadian bacon)]
          ]
        ]]>
      </grammar>
      <!-- The caller was silent, restart the field -->
      <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>

      <!-- The caller said something that was not defined in our grammar -->
      <nomatch>
        I did not recognize that topping. Please try again.
        <reprompt/>
      </nomatch>
    </field>
  </form>

</vxml>

The next requirement for my application is to collect the caller’s phone number so Strato can call if there’s a problem with the order. I’ll take a look at that tomorrow in my next blog post.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for web developers: Hello World

December 17th, 2009 by Adam Kalsey

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

If you missed it, in the first installment of this series I created an application on Evolution and assigned it some phone numbers. For the rest of the series, I’ll be using that application to test my VoiceXML apps. If you want to follow along, go create your own Evolution account.

I’m going to start simple with my first application – just answer and speak some text, then hang up. This way we can get a look at the syntax needed for VoiceXML. Throughout this series, I’ll be building an application for Strato Pizza, a fictional pizza chain. My application here is simply a greeting played when someone calls the chain’s phone number.

As the name implies, VoiceXML is written in XML. So I start with an XML declaration and tell the browser what character encoding to use, just like any other XML document. Then I create a <vxml> element that will hold the application.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

</vxml>

Inside this element I need a couple of structural elements. <form> is a container that separates different areas of input and output, sort of like different HTML forms and pages. <block> is a container that allows you to conditionally execute code. Although I’m not creating separate inputs and outputs or trying to conditionally execute code, these elements are still needed, since the next elements I’m going to create are required to be inside a <block> and a block must be inside a <form>. Since I’m not using them for anything, I don’t have to worry about any attributes right now.

Now my VoiceXML document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>

    </block>
  </form>
</vxml>

Great, now the basic structure is in place and I can put in the meat of the application. All I want to do is say something and hang up, so my application is pretty simple. I can say something by using a <prompt> element and the VoiceXML browser will perform text to speech and say whatever I typed.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >
  <form>
    <block>
    <prompt>
      Thanks for calling Strato Pizza.
    </prompt>
    </block>
  </form>
</vxml>

That’s it. The whole document. I upload the document to my web server at the URL that I configured my application with in Evolution. When I call this application using the Skype number supplied in Evolution, a text to speech (TTS) engine speaks my text.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


VoiceXML for Web developers: Introduction

December 15th, 2009 by Adam Kalsey

This post is part of a series exploring voice applications and VoiceXML through the eyes of a web developer. For the rest of the series, see the index.

I’ll admit it. Before joining Voxeo, I wasn’t much of a voice guy. I’m a web guy. I was pretty sure that voice applications were created through witchcraft. Turns out, there’s no magic involved, just some standards and markup languages. If you can create a web app, you can create a voice app. Voxeo has some great developer documentation and detailed tutorials available through Evolution, our developer portal. Over the next few weeks, I’ll be walking through some examples as I learn, from the perspective of a web developer, VoiceXML, CCXML, and Voxeo’s own CallXML.

I’ll start with VoiceXML. VoiceXML is a W3C standard, just like HTML is. Like HTML, your code is executed in a browser, but instead of a visual browser on a computer screen, in this case it’s a voice browser that you use over the telephone. To test out any of the samples I’m going to create, I’m going to need a VoiceXML browser attached to the telephone network. Voxeo provides developers with free accounts and a phone number so you can build and test your app. You’ll also need a web server to host your XML file, but Voxeo will provide some hosting space for you for free if you’d like.

Go over to Evolution and create an account. Then go to the Application Manager.

App Manager

Create a new application and call it anything you’d like. Then decide how you want your app to work. For now, I’m only using voice, so I don’t need text messaging. I can always add it in the future if I change my mind.

Creating an app

I need to tell Evolution where my VoiceXML file is at by providing a URL for it. Since I’m going to create a Hello World application and host it on my own server, I’m putting in the URL I intend to use for my VoiceXML file. Again, I can change this later if I decide on a different file name or path.

Creating an app, step 2

After I create my app, I have a new tab at the top of the page that gives me some phone numbers I can use to call my application.

app created

Clicking on that tab reveals a local number, a toll-free number, and numbers to call from Skype, SIP, and iNum providers. I can also add a dedicated local number if I’d like. Since I’m going to test with Skype, I don’t need a local number, but if you’re testing from your phone, grab one.

contact numbers

And that’s it. I now have a VoiceXML browser hooked up to the telephone network that I can use to test my application. In my next post a couple of days from now, I’ll create my first app.

Reblog this post [with Zemanta]


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.


Tracking IM interaction with Google Analytics

December 4th, 2009 by Adam Kalsey

About a month ago Google announced better support for mobile web sites in Google Analytics. One of the improvements is a method for tracking visits with server side code for mobile browsers and apps that don’t support JavaScript.

Using this new mobile support, you can now add Google Analytics to your IM bots.

Google Analytics Screenshot

First download the client libraries for Google Analytics Mobile. At the time of this writing, they have libraries for PHP, Perl, JSP and ASPX included in the download. There’s a PDF file in the download that tells you how to get your mobile tracking ID. It’s just your Google Analytics account ID, but with the UA- replaced with MO-. If your Analytics ID is UA-123456-1 then your mobile ID will be MO-123456-1.

The client library includes a file that is loaded from an image tag on your mobile site and the code to build the URL for the image tag. Since we’re working in an environment that doesn’t support image tags, we’re going to build the URL and then open it with an HTTP client.

I’m going to use the PHP library, so if you’re using a different language, you’ll need to adapt these instructions a bit.

Most of the things that you’d track in web reports don’t exist in an IM session, so to make our reports more useful, we’re going to set some of the headers and environment variables manually. This is something you’ll probably want to tweak to get useful reports. I’m going to set the User-Agent to the IM network name, build a unique user ID cookie for tracking repeat visits, and set the query string to the contents of the IM message. The request URL will remain as your bot’s URL – the one IMified posts your messages to.

The PHP library has a snippet of code (snippet1.php) that’s supposed to be placed at the top of our PHP file. Place that at the top of your bot’s code. Google’s instructions also include a second snippet that’s to be placed at the bottom of the PHP page, but since we’re not using the image tags, you don’t need to do that.

You’ll need to edit the snippet after you add it, since it contains a bug. The PHP and Perl libraries don’t properly set the query string. The ASPX and JSP libraries don’t have this bug. In the PHP snippet you just added, find the lines that say …

  if (!empty($path)) {
    $url .= "&utmp=" . urlencode($path);
  }

… and change it to read …

  if (!empty($path)) {
    if (!empty($query)) {
      $path .= "?$query";
    }
    $url .= "&utmp=" . urlencode($path);
  }

Now right before the snippet you added, we’ll create a user-agent string with the network name and a cookie value that matches the format that Google sets when they create a tracking cookie. We also need to override the server environment variables for the query string so that Analytics can track the message contents.

  // Create a User-Agent string for the .
  $UA = 'IM '. $_POST['network'];
  // Create a user ID cookie based on the IM user's name and network
  $cookie = "0x" . substr(md5($_POST['user'] .'@'. $_POST['network']), 0, 16);
  // Yes, overwriting this server variable does feel dirty. But it works.
  // Set the query string to the contents of the message.
  $_SERVER["QUERY_STRING"] = $_POST['msg'];

Now right after the snippet from the PHP library, we need to build the tracking URL and call it from PHP. The library snippet includes a function, googleAnalyticsGetImageUrl() that builds a portion of the URL for you: it returns the filename and query string. If we were sticking this in an IMG tag, that would be sufficient, but since we’re going to load the tracking URL manually, we need to construct a fully-qualified base URL in the form http://example.com/some/path/ to put before the generated tracking URL.

You can easily hard code this base URL in your code. Just upload the ga.php library to your server and then hardcode in the path like so:

  $url = 'http://example.com/my/path/'. googleAnalyticsGetImageUrl(); 

For my purposes, however, I’m going to put ga.php in the same directory as my bot’s code and then construct the base URL automatically from that path. This allows me to move my bot’s code without having to edit the base URL. It also allows me to easily turn this code into an include file that I can just drop into any bot to add tracking.

Here’s the code I’m using to build the URL:

  $protocol = $_SERVER['HTTPS'] == 'on' ? 'https' : 'http';
  $port = ($_SERVER["SERVER_PORT"] == "80") ? "" : (":".$_SERVER["SERVER_PORT"]);
  $path = dirname($_SERVER['REQUEST_URI']);
  $path == '/' ? '' : $path;
  $url = $protocol .'://'. $_SERVER['HTTP_HOST'] . $port . $path .'/'. 
         googleAnalyticsGetImageUrl();

Now that the URL is built, we need to send a GET request to it. This GET request will set the cookie and the user-agent then load the URL just as if it were an image being called from a web browser. The image itself never gets displayed, but the fact that we load it means the tracking data is sent off to Google’s servers for your Analytics account.

  $options = array(
    "http" => array(
        "method" => "GET",
        "user_agent" => $UA,
        "header" => "Accept-Language: " . 
                    $_SERVER["HTTP_ACCEPT_LANGUAGE"] . "\r\n".
                    "Cookie: __utmmobile=" . $cookie . "\r\n"
        )
  );
  $data = file_get_contents($url, false, stream_context_create($options));

Now any time someone sends a message to your bot, the interaction will be sent to Google Analytics and will show up in your reports alongside your web traffic. This mechanism can be used for things besides IM – you can inject “visits” into Analytics from any server-side interaction.

Think about what you’d like to track with your bot and change the variables to suit your needs. Maybe you have a menu-based bot and you want to track which menu item is being chosen. Setting the menu’s name or path as the URL could be a good option for that. Or for a command-based bot, perhaps just putting the command name in the query string.


If you found this post interesting or helpful, please consider either subscribing via RSS, becoming a fan on Facebook, or following us on Twitter.