Friday
Jan102014

Simplifying Speech Recognition with .NET

Over the many years I've been using .NET's Speech.Recognition features, I've learned a few things. Now I've encapsulated all that goodness into one class, SpeechTools.SpeechListener

A big problem with Speech Recognition is false positives. You only want the computer to interpret your speech when you are speaking to it. How does it know that you're talking to your friend rather than issuing a command?

The answer comes from our friend Gene Roddenberry, creator of Star Trek. Any time Kirk wanted to talk to the computer he'd first say the word "Computer." That little prompt is enough to "wake-up" the computer and listen for the next phrase. To let Kirk know that it was listening, the computer would make a bleepy noise. We can do the same.

Another thing we can do is determine whether the word or phrase was spoken by itself or as part of a larger phrase or sentence. You only want the computer to respond to a command when the command is spoken by itself. If you speak the command or phrase as part of longer sentence it should be ignored.

Above all, the code for speech recognition should be much easier than it is. If I just want to recognize a set of phrases or commands, it shouldn't require hours of learning about grammars and builders and all that jazz.

SpeechListener simplifies all of that. Take a look at this demo app window:

SpeechTools Demo

The app is ready to test without modification. Just press the "Start Listening" button.

By default, our wake up command is the word "Computer," but it can be anything you like. Say the wake up command by itself. SpeechTools plays the Star Trek Computer wake up wave file (provided) and fires a WakeUp event for your convenience.

At this point it is listening for any of the given phrases. Say "Is it lunch time yet?" and a Recognized event will fire, passing the actual SpeechRecognizedEventArgs object from the API. To recognize another word, repeat the whole process, starting with the wake up command.

Now, check out the code. First the XAML:
    <Window x:Class="SpeechToolsDemo.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="Speech Tools Demo" Height="350" Width="525">
        <StackPanel>
            <StackPanel x:Name="SettingsPanel" HorizontalAlignment="Left">
                <CheckBox x:Name="ListenForWakeUpCommandCheckBox"
                    Margin="10,10,0,0"
                    FontSize="14"
                    IsChecked="True">Listen for Wake-up Command</CheckBox>

                <StackPanel Margin="10,10,0,0" Orientation="Horizontal"
                    IsEnabled="{Binding ElementName=ListenForWakeUpCommandCheckBox, 
                    Path=IsChecked}" >
                    <TextBlock FontSize="14"
                        Text="Wake-up Command:" />
                    <TextBox x:Name="WakeUpCommandTextBox"
                        Margin="10,0,0,0"
                        Width="200"
                        FontSize="14"
                        Text="Computer" />
                </StackPanel>

                <TextBlock Margin="10,20,0,0"
                    FontSize="14"
                    Text="Enter words or phrases to recognize, one per each line:" />
                <TextBox x:Name="PhrasesTextBox"
                    Margin="10,10,0,0"
                    FontSize="14"
                    Width="450"
                    Height="130"
                    VerticalScrollBarVisibility="Visible"
                    HorizontalScrollBarVisibility="Visible"
                    TextWrapping="NoWrap"
                    SpellCheck.IsEnabled="True"
                    AcceptsReturn="True" />
            </StackPanel>

            <Button x:Name="ListenButton"
                HorizontalAlignment="Left"
                Margin="10,10,0,0"
                FontSize="14"
                Width="100"
                Content=" Start Listening " />
            <TextBlock x:Name="HeardTextBlock"
                Margin="10,10,0,0"
                FontSize="16" />
        </StackPanel>
    </Window>
	
Fairly straight ahead here. Now for the wonderful part. The Code:
        using System;
        using System.Windows;

        namespace SpeechToolsDemo
        {
            /// <summary>
            /// Interaction logic for MainWindow.xaml
            /// </summary>
            public partial class MainWindow : Window
            {
                SpeechTools.SpeechListener listener = null;
                // set properties: Build Action = None, 
                // Copy to Output Directory = Copy Always
                string WakeUpWavFile = "computer.wav";

                public MainWindow()
                {
                    InitializeComponent();
                    listener = new SpeechTools.SpeechListener();
                    listener.SpeechRecognized += listener_SpeechRecognized;
                    listener.WakeUp += listener_WakeUp;
                    // seed the Phrases. You can change them, of course!
                    this.PhrasesTextBox.Text = "This is cool\n" + 
                        "Is it lunch time yet?\n" +
                        "Let's Party";
                    this.ListenButton.Click += ListenButton_Click;
                }

                void listener_WakeUp(object sender, 
                    System.Speech.Recognition.SpeechRecognizedEventArgs e)
                {
                    // This event fires when you speak the wake-up command
                }

                void listener_SpeechRecognized(object sender, 
                    System.Speech.Recognition.SpeechRecognizedEventArgs e)
                {
                    // Fires when a phrase is recognized
                    HeardTextBlock.Text = DateTime.Now.ToLongTimeString() + ": " + e.Result.Text;
                }

                void ListenButton_Click(object sender, RoutedEventArgs e)
                {
                    if (ListenButton.Content.ToString() == " Start Listening ")
                    {
                        // use a wake up command for added accuracy
                        if (ListenForWakeUpCommandCheckBox.IsChecked == true)
                            listener.WakeUpOnKeyPhrase(WakeUpCommandTextBox.Text, 
                                true, WakeUpWavFile);
                        // set the phrases to listen for and start listening
                        listener.Phrases = PhrasesTextBox.Text;
                        listener.StartListening();
                        // UI stuff
                        SettingsPanel.IsEnabled = false;
                        ListenButton.Content = " Stop Listening ";
                    }
                    else
                    {
                        listener.StopListening();
                        // UI stuff
                        SettingsPanel.IsEnabled = true;
                        ListenButton.Content = " Start Listening ";
                    }
                }
            }
        }

You don't have to use a wake up command, of course, but if you want to, just call WakeUpOnKeyPhrase passing the phrase. If you want SpeechListener to play a WAV file when it "wakes up" - a nice little extra touch - pass true for the second argument (PlayWaveFileOnWakeUp) and pass the wave file name as the third parameter. If you don't want to play a wav file just pass false and an empty string.

The Phrases property takes a CRLF delimited string of phrases and internally creates a grammar from it. Just set Phrases to the words and phrases you want it to recognize.

Finally, call StartListening().

If you called WakeUpOnKeyPhrase prior to StartListening, nothing happens until you utter the Wake up command, at which point the WakeUp event fires. Now SpeechListener is waiting for you to speak one of the phrases. It will either fire the SpeechRecognized event or nothing at all, after which you'll have to speak the Wake Up command again to repeat the process. This pattern continues until you call StopListening().

If you are not using Wake Up command, the SpeechRecognized event will continue to fire until you call StopListening().

If you want more fine-grained access to the properties and events, just acccess the public RecognitionEngine field, which exposes the SpeechRecognitionEngine object used internally. You can party on all the events if you like.

How it works - Sentence Detection

Before we can determine if a word or phrase is part of a sentence, we have to create a Grammar that allows for wild cards (undefined speech) on either side of the phrase. Here's the code that I use. I create two extra GrammarBuilder objects containing wild cards. One goes before the choices, and another one goes after. This is important, and you'll see why in a minute.
public Grammar CreateGrammar(string[] phrases)
{
    Grammar g;

    // first, put the phrases in a choices object
    var choices = new Choices(phrases);

    // create a grammar builder to prepend our choices
    var beforeBuilder = new GrammarBuilder();
    // append a wildcard (unknown speech)
    beforeBuilder.AppendWildcard();
    // create a semantic key from the builder
    var beforeKey = new SemanticResultKey("beforeKey", beforeBuilder);

    // do the same three steps to create a "wild card" to follow our choices
    var afterBuilder = new GrammarBuilder();
    afterBuilder.AppendWildcard();
    var afterKey = new SemanticResultKey("afterKey", afterBuilder);

    // create the main grammar builder
    var builder = new GrammarBuilder();
    builder.Culture = RecognitionEngine.RecognizerInfo.Culture;
    builder.Append(beforeBuilder);
    builder.Append(choices);
    builder.Append(afterBuilder);

    // create a new grammar from the final builder
    return new Grammar(builder);
}
The function IsPartOfSentence determines if a RecognitionResult is part of a sentence by checking the Words collection. The word "..." denotes a wild card (undefined or unknown speech). So, if the word "..." is in the Words collection, we can safely ignore it because it was spoken in the context of a bigger phrase.
public bool IsPartOfSentence(RecognitionResult result)
{
    foreach (var word in result.Words)
    {
	if (word.Text == "...")
	    return true;
    }
    return false;
}
The rest of the code is fairly straight ahead, except for one thing that drives me nuts about Speech Recognition. Typically, if you want to interact using Speech, that means you will say something, the PC will respond (typically with the Speech.Synthesis.SpeechSynthesizer) and then you want it to start listening again for more commands or phrases which may be different from the last ones depending on what you want to do.

But here's the thing. When you recognize speech asynchronously you handle an event. If you want to change up what you're doing, you have to get out of this thread to let the calling code complete. Fact of life, but still a PITA. So, to get around this I implement an old favorite pattern of starting a 1 millisecond timer just one time. It's quick and easy and it works without ceremony. To get access to the SpeechRecognizedEventArgs parameter, I just stuff it in the timer object's tag.
System.Windows.Threading.DispatcherTimer GetOutOfThisMethodTimer;

public SpeechListener()
{
    // other initialization code here
    GetOutOfThisMethodTimer = new System.Windows.Threading.DispatcherTimer();
    GetOutOfThisMethodTimer.Interval = TimeSpan.FromMilliseconds(1);
    GetOutOfThisMethodTimer.Tick += GetOutOfThisMethodTimer_Tick;
}

void SpeechRecognitionEngine_SpeechRecognized(object sender, 
   System.Speech.Recognition.SpeechRecognizedEventArgs e)
{
    GetOutOfThisMethodTimer.Tag = e;
    GetOutOfThisMethodTimer.Start();
}

void GetOutOfThisMethodTimer_Tick(object sender, EventArgs e)
{
    GetOutOfThisMethodTimer.Stop();
    var obj = GetOutOfThisMethodTimer.Tag;
    GetOutOfThisMethodTimer.Tag = null;

    if (obj == null)
    {
        StartListening();
        return;
    }

    var args = (SpeechRecognizedEventArgs)obj;
    
}    

Download the code here and enjoy!

- Carl

PrintView Printer Friendly Version

EmailEmail Article to Friend

References (30)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    The greatest information on {seo|продвижение сайта|ceo|
  • Response
    Response: chiroractic
    Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: dvnf wiki
    Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: xovilichter
    Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: xovilichter
    Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: asus system
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: senior pastor jobs
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: Check Dit
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: www.zillow.com
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: Visit Homepage
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: deluxe mah jong
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
    Response: click
    Carl Franklin - Blog - Simplifying Speech Recognition with .NET
  • Response
  • Response
    Response: new year 2016
    rszrt
  • Response
  • Response
  • Response
    Roland Garros 2016
  • Response
    TDF 2016 live

Reader Comments (3)

Brilliant - if only the Kinect team had used this I would be so much happier with it. I can't tell you the number of times I have been watching Netflix, Hulu, etc. and the show has stopped because I had kinect plugged in and the dialog had the word stop (or even something similar) in it.

January 22, 2014 | Unregistered CommenterTheron

This article save me a lot of time

November 3, 2014 | Unregistered CommenterAuto

You're awesome! After days of research, I have finally found exactly what I wanted: an accurate speech recognition library that works offline and only looks for what I wanted. Other examples based on .Net's speech.recognition suck. Even if i say like sadfasdfsagfd they all came up with one of the commands on list, but this returns only when I really say them. Thanks a lot!

July 25, 2015 | Unregistered CommenterMuhsin Fatih
Comments for this entry have been disabled. Additional comments may not be added to this entry at this time.
« Simplifying Kinect for Windows 1.x Skeleton Drawing | Main | GesturePak 2.0 Alpha »