Simplifying Kinect for Windows 1.x Skeleton Drawing

KinectTools is an abstraction over the code that handles the Kinect v1 sensors, Skeleton data, and drawing the skeleton. It exposes Brush and Pen properties for drawing so you have control over the look of your skeleton. It also can place a PNG file over the head as you move around, providing hours of jocularity.

If you've done any work with the Kinect for Windows 1.x SDK you've probably already created an abstraction such as this. But if you haven't here's a nice one for you.

What's cool about this is that it uses the term Body which is what SDK 2.0 calls a Skeleton. I've also written this abstraction for SDK 2.0 (only in pre-release) so using this will get you ready for the future. The next version of the GesturePak Recorder and sample code uses this abstraction as well.

Here's a very simple WPF app that uses the KinectTools SimpleBodyEngine class to draw the skeleton in real time, put an image of my head on top of it, and turn the background of the window red if you move your right hand out in front of your body 1/3 of a meter.

<Window x:Class="SimpleBodyEngineTest.MainWindow"
        Title="MainWindow" Height="600" Width="800">
        <StackPanel Orientation="Horizontal">
            <TextBlock Text="Sensor: "/>
            <TextBlock Text="{Binding SensorStatusName}" />
        <Image x:Name="BodyImage" Source="{Binding ImageSource}" Stretch="None" />
using System;
using System.Windows;
using System.Windows.Media;
using Microsoft.Kinect;
using KinectTools;

namespace SimpleBodyEngineTest
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
        // This object makes handling Kinect Skeleton objects easy!
        SimpleBodyEngine BodyEngine = new SimpleBodyEngine();

        public MainWindow()
            // event handlers
            this.Closing += MainWindow_Closing;
            BodyEngine.BodyTracked += BodyEngine_BodyTracked;

            // put Carl's head on top of the Skeleton
            BodyEngine.HeadImageUri = new Uri("carl.png", UriKind.Relative);
            // bind the XAML to the SimpleBodyEngine
            this.DataContext = BodyEngine;

        /// <summary>
        /// This event fires when a *tracked* skeleton frame is available
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        void BodyEngine_BodyTracked(object sender, BodyTrackedEventArgs e)
            // Get the Z position of the hand and spine
            var hand = e.Body.Joints[JointType.HandRight].Position.Z;
            var spine = e.Body.Joints[JointType.Spine].Position.Z;

            // if the hand is in front of the spine...
            if (Math.Abs(hand - spine) > .3)
                // turn the background red
                Background = Brushes.Red;
                Background = Brushes.White;

        void MainWindow_Closing(object sender, System.ComponentModel.CancelEventArgs e)
Forget about the hundreds of lines of code to draw the Skeleton. If you just want to handle the data, read this blog post I wrote on the basics of Skeleton tracking. This code is so simple. Put up an image and bind it to a property. Create a new SimpleBodyEngine object and make it the DataContext. Done.

Download the code here and enjoy.


Simplifying Speech Recognition with .NET

Over the many years I've been using .NET's Speech.Recognition features, I've learned a few things. Now I've encapsulated all that goodness into one class, SpeechTools.SpeechListener

A big problem with Speech Recognition is false positives. You only want the computer to interpret your speech when you are speaking to it. How does it know that you're talking to your friend rather than issuing a command?

The answer comes from our friend Gene Roddenberry, creator of Star Trek. Any time Kirk wanted to talk to the computer he'd first say the word "Computer." That little prompt is enough to "wake-up" the computer and listen for the next phrase. To let Kirk know that it was listening, the computer would make a bleepy noise. We can do the same.

Another thing we can do is determine whether the word or phrase was spoken by itself or as part of a larger phrase or sentence. You only want the computer to respond to a command when the command is spoken by itself. If you speak the command or phrase as part of longer sentence it should be ignored.

Above all, the code for speech recognition should be much easier than it is. If I just want to recognize a set of phrases or commands, it shouldn't require hours of learning about grammars and builders and all that jazz.

SpeechListener simplifies all of that. Take a look at this demo app window:

SpeechTools Demo

The app is ready to test without modification. Just press the "Start Listening" button.

By default, our wake up command is the word "Computer," but it can be anything you like. Say the wake up command by itself. SpeechTools plays the Star Trek Computer wake up wave file (provided) and fires a WakeUp event for your convenience.

At this point it is listening for any of the given phrases. Say "Is it lunch time yet?" and a Recognized event will fire, passing the actual SpeechRecognizedEventArgs object from the API. To recognize another word, repeat the whole process, starting with the wake up command.

Now, check out the code. First the XAML:
    <Window x:Class="SpeechToolsDemo.MainWindow"
        Title="Speech Tools Demo" Height="350" Width="525">
            <StackPanel x:Name="SettingsPanel" HorizontalAlignment="Left">
                <CheckBox x:Name="ListenForWakeUpCommandCheckBox"
                    IsChecked="True">Listen for Wake-up Command</CheckBox>

                <StackPanel Margin="10,10,0,0" Orientation="Horizontal"
                    IsEnabled="{Binding ElementName=ListenForWakeUpCommandCheckBox, 
                    Path=IsChecked}" >
                    <TextBlock FontSize="14"
                        Text="Wake-up Command:" />
                    <TextBox x:Name="WakeUpCommandTextBox"
                        Text="Computer" />

                <TextBlock Margin="10,20,0,0"
                    Text="Enter words or phrases to recognize, one per each line:" />
                <TextBox x:Name="PhrasesTextBox"
                    AcceptsReturn="True" />

            <Button x:Name="ListenButton"
                Content=" Start Listening " />
            <TextBlock x:Name="HeardTextBlock"
                FontSize="16" />
Fairly straight ahead here. Now for the wonderful part. The Code:
        using System;
        using System.Windows;

        namespace SpeechToolsDemo
            /// <summary>
            /// Interaction logic for MainWindow.xaml
            /// </summary>
            public partial class MainWindow : Window
                SpeechTools.SpeechListener listener = null;
                // set properties: Build Action = None, 
                // Copy to Output Directory = Copy Always
                string WakeUpWavFile = "computer.wav";

                public MainWindow()
                    listener = new SpeechTools.SpeechListener();
                    listener.SpeechRecognized += listener_SpeechRecognized;
                    listener.WakeUp += listener_WakeUp;
                    // seed the Phrases. You can change them, of course!
                    this.PhrasesTextBox.Text = "This is cool\n" + 
                        "Is it lunch time yet?\n" +
                        "Let's Party";
                    this.ListenButton.Click += ListenButton_Click;

                void listener_WakeUp(object sender, 
                    System.Speech.Recognition.SpeechRecognizedEventArgs e)
                    // This event fires when you speak the wake-up command

                void listener_SpeechRecognized(object sender, 
                    System.Speech.Recognition.SpeechRecognizedEventArgs e)
                    // Fires when a phrase is recognized
                    HeardTextBlock.Text = DateTime.Now.ToLongTimeString() + ": " + e.Result.Text;

                void ListenButton_Click(object sender, RoutedEventArgs e)
                    if (ListenButton.Content.ToString() == " Start Listening ")
                        // use a wake up command for added accuracy
                        if (ListenForWakeUpCommandCheckBox.IsChecked == true)
                                true, WakeUpWavFile);
                        // set the phrases to listen for and start listening
                        listener.Phrases = PhrasesTextBox.Text;
                        // UI stuff
                        SettingsPanel.IsEnabled = false;
                        ListenButton.Content = " Stop Listening ";
                        // UI stuff
                        SettingsPanel.IsEnabled = true;
                        ListenButton.Content = " Start Listening ";

You don't have to use a wake up command, of course, but if you want to, just call WakeUpOnKeyPhrase passing the phrase. If you want SpeechListener to play a WAV file when it "wakes up" - a nice little extra touch - pass true for the second argument (PlayWaveFileOnWakeUp) and pass the wave file name as the third parameter. If you don't want to play a wav file just pass false and an empty string.

The Phrases property takes a CRLF delimited string of phrases and internally creates a grammar from it. Just set Phrases to the words and phrases you want it to recognize.

Finally, call StartListening().

If you called WakeUpOnKeyPhrase prior to StartListening, nothing happens until you utter the Wake up command, at which point the WakeUp event fires. Now SpeechListener is waiting for you to speak one of the phrases. It will either fire the SpeechRecognized event or nothing at all, after which you'll have to speak the Wake Up command again to repeat the process. This pattern continues until you call StopListening().

If you are not using Wake Up command, the SpeechRecognized event will continue to fire until you call StopListening().

If you want more fine-grained access to the properties and events, just acccess the public RecognitionEngine field, which exposes the SpeechRecognitionEngine object used internally. You can party on all the events if you like.

How it works - Sentence Detection

Before we can determine if a word or phrase is part of a sentence, we have to create a Grammar that allows for wild cards (undefined speech) on either side of the phrase. Here's the code that I use. I create two extra GrammarBuilder objects containing wild cards. One goes before the choices, and another one goes after. This is important, and you'll see why in a minute.
public Grammar CreateGrammar(string[] phrases)
    Grammar g;

    // first, put the phrases in a choices object
    var choices = new Choices(phrases);

    // create a grammar builder to prepend our choices
    var beforeBuilder = new GrammarBuilder();
    // append a wildcard (unknown speech)
    // create a semantic key from the builder
    var beforeKey = new SemanticResultKey("beforeKey", beforeBuilder);

    // do the same three steps to create a "wild card" to follow our choices
    var afterBuilder = new GrammarBuilder();
    var afterKey = new SemanticResultKey("afterKey", afterBuilder);

    // create the main grammar builder
    var builder = new GrammarBuilder();
    builder.Culture = RecognitionEngine.RecognizerInfo.Culture;

    // create a new grammar from the final builder
    return new Grammar(builder);
The function IsPartOfSentence determines if a RecognitionResult is part of a sentence by checking the Words collection. The word "..." denotes a wild card (undefined or unknown speech). So, if the word "..." is in the Words collection, we can safely ignore it because it was spoken in the context of a bigger phrase.
public bool IsPartOfSentence(RecognitionResult result)
    foreach (var word in result.Words)
	if (word.Text == "...")
	    return true;
    return false;
The rest of the code is fairly straight ahead, except for one thing that drives me nuts about Speech Recognition. Typically, if you want to interact using Speech, that means you will say something, the PC will respond (typically with the Speech.Synthesis.SpeechSynthesizer) and then you want it to start listening again for more commands or phrases which may be different from the last ones depending on what you want to do.

But here's the thing. When you recognize speech asynchronously you handle an event. If you want to change up what you're doing, you have to get out of this thread to let the calling code complete. Fact of life, but still a PITA. So, to get around this I implement an old favorite pattern of starting a 1 millisecond timer just one time. It's quick and easy and it works without ceremony. To get access to the SpeechRecognizedEventArgs parameter, I just stuff it in the timer object's tag.
System.Windows.Threading.DispatcherTimer GetOutOfThisMethodTimer;

public SpeechListener()
    // other initialization code here
    GetOutOfThisMethodTimer = new System.Windows.Threading.DispatcherTimer();
    GetOutOfThisMethodTimer.Interval = TimeSpan.FromMilliseconds(1);
    GetOutOfThisMethodTimer.Tick += GetOutOfThisMethodTimer_Tick;

void SpeechRecognitionEngine_SpeechRecognized(object sender, 
   System.Speech.Recognition.SpeechRecognizedEventArgs e)
    GetOutOfThisMethodTimer.Tag = e;

void GetOutOfThisMethodTimer_Tick(object sender, EventArgs e)
    var obj = GetOutOfThisMethodTimer.Tag;
    GetOutOfThisMethodTimer.Tag = null;

    if (obj == null)

    var args = (SpeechRecognizedEventArgs)obj;

Download the code here and enjoy!

- Carl

GesturePak 2.0 Alpha

New Kinect for Windows and SDK are Coming!

When the Microsoft Kinect for Windows team sent all of it's MVPs (myself included) the new Kinect Sensor and access to the Developer Preview edition of the Kinect For Windows SDK v2, it didn't take me long to refactor the GesturePak Matcher to work with the new sensor.

The difference is amazing. This new sensor is so much more accurate, so much faster (lower latency) and can see you in practically no light. Clearly, there will be a demand for robust and easy gesture recognition software.


I wrote GesturePak to make it easier for developers and end users to create and recognize gestures using Kinect for Windows. You essentially "record" a gesture using movements, the data is saved to an xml file, and you can then load those files in your code and use the aforementioned GesturePak Matcher to tell you (in real-time) if your user has made any of those gestures.

GesturePak 2.0

GesturePak 1.0 was fun to write. It works, but it's a little clunky. The device itself is frustrating to use because of lighting restrictions, tracking problems, jitters, and all that. The biggest issue I had was that if the Kinect stopped tracking you for whatever reason, it took a long time to re-establish communication. This major limitation forced me into a design where you really couldn't walk away from tracking to edit the gesture parameters. Everything had to be done with speech. Naming your gesture had to be done by waving your hands over a huge keyboard to select letters. Because of this, you had to break down a gesture into "Poses", a set of "snapshots" which are matched in series to make a gesture.

For version 2.0 I wanted to take advantage of the power in the device to make the whole experience more pleasant. Now you can simply record yourself performing the gesture from beginning to end, and then sit down at the machine to complete the editing process.

click to watch video
Video: Carl shows you how to create and test a gesture in about 2 minutes.

To make the new GesturePak Recorder/Editor app, I started with the Body demo that came with the Developer Preview SDK, and using partial classes I added the supporting code in another C# file. I also wrote a tester app that loads all of the gesture files in your Documents\GesturePak folder and tells you when you have a match.

GesturePak File Format v2

The xml file format in v2 is vastly different, easier to read, and makes more sense. Here is a sample from a new gesture file:

As you can see, all the Tracking properties are now in the Gesture class, which is where they make sense. A new feature lets you track the state of the hands (open, closed, etc.), allowing for even more interactive gestures.


In v1 you had to "pose" for a "snapshot," a single array of data that describes the location of all of your joints in 3D space. A Gesture occurs when GesturePak sees a person match two or more of these "poses" over time in the right order.

Since v2 doesn't require you to "pose" at all (you simply perform the gesture) it makes more sense to call these individual snapshots "frames" like frames of a movie. That's what they are, really.

Let's look at how frames are expressed in the new Gesture file format:

  <Frame Name="Frame 1" Match="False" xml:lang="en">
	  <X Value="-0.002527997" />
	  <Y Value="-0.262233" />
	  <Z Value="2.130034" />
	  <X Value="-0.005012542" />
	  <Y Value="0.08136677" />
	  <Z Value="2.171184" />
We still save all the raw data in the XML file, but it's strongly typed. That makes it easy to find the exact value you're looking for. This is a big improvement over v1. The hand state is also tracked. Importantly, GesturePak 2.0 will load gesture files created with v1.

Recording a Gesture

My goal is to simplify. No GestureMouse. No naming gestures with "the force."

The Recorder/Editor now has two modes, Live and Edit. In Live mode you can record and test your gestures. In Edit mode you do everything else. The app starts in Edit mode. To record a gesture, you first press the "Live Mode" button and then the "Record Gesture" button.

Simply stand in front of the Kinect and wait until it's tracking you. Now say "Start Recording." A red light will appear on the screen. Make your gesture from start to finish and say "OK. Stop." The light goes off, and a "Save As..." dialog pops up. GesturePak has just recorded every frame that happened when that light was on. It could be hundreds of frames.

Walk up to the app, save the file, and sit down. Your real time skeleton goes away and the gesture you just recorded comes up on the screen in Edit mdoe. Use the mouse wheel to "scrub" through every frame of the gesture, or press the Animate button to watch your stick figure repeat your gesture.

Editing your Gesture

Editing is extremely easy and only takes a few moments once you get the hang of it. Changes are saved automatically.

Select joints to track
Simply mouse click or touch a joint to toggle tracking. Tracked joints appear white and not-tracked joints appear green. As you hover over the joints with a mouse the joint name is displayed above the skeleton. Changes are automatically saved.

Trim the gesture
You can optionally trim out unnecessary frames with the "Trim Start" and "Trim End" buttons. To trim the leading frames, scrub to the first frame that you'd like the gesture animation to start with, and then click "Trim Start." Changes are saved automatically. Similarly, scrub to a frame after the gesture has been completed and click "Trim End." Now you've at least thinned out the data a little. Note that trimming does not define the frames that will be matched (for that, read on), it just cuts out unwanted frames. The animation should be smooth from start to finish.

Pick the frames for GesturePak to match against
It may seem like you have all the data you need to let GesturePak do it's thing. The fact is, there's too much data. If we required the matcher to match every single frame there would a higher chance of the gesture not being recognized. So, scrub through the frames and pick the smallest number of frames necessary to define this gesture. You can select or de-select a frame by clicking the "Match" button.

Test it!
To test it, just click the "Live Mode" button, step back, and perform the gesture. You'll hear the word "Match" when you do it correctly (assuming that your volume is up and your headphones aren't plugged in and all that). Of course, if you look at the code, it's just playing a WAVE file when it gets a match. Your code can do anything you want it to do.

Other Parameters
Just as with GesturePak 1.0, you can opt to track the X, Y, and Z axes individually. By default Z is not tracked. You should only track it if it matters to the gesture, otherwise it could lead to inaccurate results.

You can also track the hand state. Think of hand state like another joint. If you track the right hand state, and in your gesture you open and raise your right hand, close it, move it to the left, and open it, you can match that action precisely. The first frame to match is where your hand is up and open. The second frame would be where your hand is closed. The third frame would be where your hand has moved but is still closed, and the fourth would be when your hand opens again.

The FudgeFactor is what we called "Accuracy" in GesturePak 1.0. This is the amount of variance that's allowed when we sum the X, Y, and Z positions and compare them to poses in the gesture. A larger number means more false positives. A smaller number means more accuracy is required to trigger a gesture.

The Max Duration is a millisecond value that defines the maximum amount of time allowed on a particular frame. If a timeout occurs before the next frame in the gesture is matched, the gesture will not be matched and you'll have to start over at the first frame. By default, you get half a second at each frame. This ensures that gestures are done intentionally. You can, of course, adjust this value.

Using the GestureMatcher in your code

The Kinect for Windows sensor can track up to 6 people (Body objects) at once! So, in this case, you'd create an array of 6 GestureMatcher objects that will correspond to those Body objects.

    private GestureMatcher[] matchers = new GestureMatcher[6];
Load a list of gestures you want to track into each GestureMatcher. Here's a method that loads all of the gestures in your GesturePak folder:

    void loadGestures()
        string GPpath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments)  
            + "\\gesturepak";
        var files = System.IO.Directory.GetFiles(GPpath, "*.xml");
        var gestures = new List<Gesture>();
        foreach (string file in files)
            gestures.Add(new Gesture(file));
        for (int i = 0; i < matchers.Length; i++)
            matchers[i] = new GestureMatcher(gestures);
When dealing with Body objects, you use a BodyFrameReader, handling a FrameArrived event. Here's the bare minimum code for handling Bodies, associating them with a GestureMatcher, and checking for Matches. There is code missing here, of course, but it's to show you how easily GesturePak fits into the existing framework of code surrounding the handling of real-time Body data from the Kinect sensor. For actual working code, look at the GesturePakGestureTester app.

    void reader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
        BodyFrameReference frameReference = e.FrameReference;
            BodyFrame frame = frameReference.AcquireFrame();
            if (frame != null)
                // BodyFrame is IDisposable
                using (frame)
                    // This is required by the SDK to acquire the actual Body position data into the bodies array
                    // bodies[] is an array defined elsewhere

                    for (int i = 0; i < bodies.Length; i++)
                        Body body = bodies[i];
                        GestureMatcher matcher = matchers[i];

                        if (body.IsTracked)
                            // associate this body with it's matcher if it's not already
                            if (matcher.Body == null || matcher.Body != body)
                                matcher.Body = body;

                            // draw body frame here if you like

                            // Do we have a match on this gesture with this Body?
                            Gesture match = matcher.GetMatch();
                            if (match != null)
                                // match is the gesture that was matched!
        catch (Exception)
            // ignore if the frame is no longer available

Source will be included in v2

The price has not been set, but I plan to ship the C# source with GesturePak 2.0 at some level. You will be free to modify it for your own apps and use it however you like. You will get the source code to the API, the recorder/editor, and the tester app. The recorder/editor can be modified and included in your own app if you want to give your end-users the ability to create their own gestures. If you have code to contribute back to GesturePak, I would welcome it!

Get the bits!

Do you have the Kinect for Windows Developer Preview SDK and the Kinect Sensor v2? Would you like to take GesturePak 2.0 for a test run? Send me an email at with the subject GesturePak 2.0 Alpha and I'll gladly send you the latest bits. I only ask that you are serious about it, and send me feedback, good and bad.

C# to get your local IP address when online.


This quick function returns your local IP address or an empty string if not connected or no ip address is found. Requires a "using System.Net;" statement.

        /// <summary>
        /// returns the first local IP address that's connected to the network
        /// only if it's connected to the network.
        /// </summary>
        /// <returns></returns>
        private string getLocalIPAddress()
            string ret = "";
            // Are we connected to the network?
            if (System.Net.NetworkInformation.NetworkInterface.GetIsNetworkAvailable())
                // get a list of local addresses
                var addrs = Dns.GetHostAddresses(Dns.GetHostName());
                foreach (IPAddress ip in addrs)
                    // is this an IPv4 address?
                    if (ip.AddressFamily == System.Net.Sockets.AddressFamily.InterNetwork)
                        // that's the one
                        ret = ip.ToString();
            return ret;




Simple way to avoid caching in Javascript

This is a tried and true method to avoid a situation in which you want to make a call to a static url which may return a different result each time. Simply append an argument to the url with a random or otherwise unique string as the value.

Consider this javascript function:

        function getUniqueString()
            var d = new Date();
            return d.getTime().toString();

This will return a unique string every time that you call it.

Now, suppose you are calling a url that looks like this:

        var url = "";

It may work the way you expect in one browser but not in another.  Just append an argument to it like so:

       var url = "" + getUniqueString();

The url might look like this:

Well done!