Carl Franklin - Blog - Simplifying Speech Recognition with .NET

Friday

Jan102014

Simplifying Speech Recognition with .NET

Friday, January 10, 2014 at 03:20PM

Over the many years I've been using .NET's Speech.Recognition features, I've learned a few things. Now I've encapsulated all that goodness into one class, SpeechTools.SpeechListener

A big problem with Speech Recognition is false positives. You only want the computer to interpret your speech when you are speaking to it. How does it know that you're talking to your friend rather than issuing a command?

The answer comes from our friend Gene Roddenberry, creator of Star Trek. Any time Kirk wanted to talk to the computer he'd first say the word "Computer." That little prompt is enough to "wake-up" the computer and listen for the next phrase. To let Kirk know that it was listening, the computer would make a bleepy noise. We can do the same.

Another thing we can do is determine whether the word or phrase was spoken by itself or as part of a larger phrase or sentence. You only want the computer to respond to a command when the command is spoken by itself. If you speak the command or phrase as part of longer sentence it should be ignored.

Above all, the code for speech recognition should be much easier than it is. If I just want to recognize a set of phrases or commands, it shouldn't require hours of learning about grammars and builders and all that jazz.

SpeechListener simplifies all of that. Take a look at this demo app window:

SpeechTools Demo

The app is ready to test without modification. Just press the "Start Listening" button.

By default, our wake up command is the word "Computer," but it can be anything you like. Say the wake up command by itself. SpeechTools plays the Star Trek Computer wake up wave file (provided) and fires a WakeUp event for your convenience.

At this point it is listening for any of the given phrases. Say "Is it lunch time yet?" and a Recognized event will fire, passing the actual SpeechRecognizedEventArgs object from the API. To recognize another word, repeat the whole process, starting with the wake up command.

Now, check out the code. First the XAML:

    <Window x:Class="SpeechToolsDemo.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="Speech Tools Demo" Height="350" Width="525">
        <StackPanel>
            <StackPanel x:Name="SettingsPanel" HorizontalAlignment="Left">
                <CheckBox x:Name="ListenForWakeUpCommandCheckBox"
                    Margin="10,10,0,0"
                    FontSize="14"
                    IsChecked="True">Listen for Wake-up Command</CheckBox>

                <StackPanel Margin="10,10,0,0" Orientation="Horizontal"
                    IsEnabled="{Binding ElementName=ListenForWakeUpCommandCheckBox, 
                    Path=IsChecked}" >
                    <TextBlock FontSize="14"
                        Text="Wake-up Command:" />
                    <TextBox x:Name="WakeUpCommandTextBox"
                        Margin="10,0,0,0"
                        Width="200"
                        FontSize="14"
                        Text="Computer" />
                </StackPanel>

                <TextBlock Margin="10,20,0,0"
                    FontSize="14"
                    Text="Enter words or phrases to recognize, one per each line:" />
                <TextBox x:Name="PhrasesTextBox"
                    Margin="10,10,0,0"
                    FontSize="14"
                    Width="450"
                    Height="130"
                    VerticalScrollBarVisibility="Visible"
                    HorizontalScrollBarVisibility="Visible"
                    TextWrapping="NoWrap"
                    SpellCheck.IsEnabled="True"
                    AcceptsReturn="True" />
            </StackPanel>

            <Button x:Name="ListenButton"
                HorizontalAlignment="Left"
                Margin="10,10,0,0"
                FontSize="14"
                Width="100"
                Content=" Start Listening " />
            <TextBlock x:Name="HeardTextBlock"
                Margin="10,10,0,0"
                FontSize="16" />
        </StackPanel>
    </Window>

Fairly straight ahead here. Now for the wonderful part. The Code:

        using System;
        using System.Windows;

        namespace SpeechToolsDemo
        {
            /// <summary>
            /// Interaction logic for MainWindow.xaml
            /// </summary>
            public partial class MainWindow : Window
            {
                SpeechTools.SpeechListener listener = null;
                // set properties: Build Action = None, 
                // Copy to Output Directory = Copy Always
                string WakeUpWavFile = "computer.wav";

                public MainWindow()
                {
                    InitializeComponent();
                    listener = new SpeechTools.SpeechListener();
                    listener.SpeechRecognized += listener_SpeechRecognized;
                    listener.WakeUp += listener_WakeUp;
                    // seed the Phrases. You can change them, of course!
                    this.PhrasesTextBox.Text = "This is cool\n" + 
                        "Is it lunch time yet?\n" +
                        "Let's Party";
                    this.ListenButton.Click += ListenButton_Click;
                }

                void listener_WakeUp(object sender, 
                    System.Speech.Recognition.SpeechRecognizedEventArgs e)
                {
                    // This event fires when you speak the wake-up command
                }

                void listener_SpeechRecognized(object sender, 
                    System.Speech.Recognition.SpeechRecognizedEventArgs e)
                {
                    // Fires when a phrase is recognized
                    HeardTextBlock.Text = DateTime.Now.ToLongTimeString() + ": " + e.Result.Text;
                }

                void ListenButton_Click(object sender, RoutedEventArgs e)
                {
                    if (ListenButton.Content.ToString() == " Start Listening ")
                    {
                        // use a wake up command for added accuracy
                        if (ListenForWakeUpCommandCheckBox.IsChecked == true)
                            listener.WakeUpOnKeyPhrase(WakeUpCommandTextBox.Text, 
                                true, WakeUpWavFile);
                        // set the phrases to listen for and start listening
                        listener.Phrases = PhrasesTextBox.Text;
                        listener.StartListening();
                        // UI stuff
                        SettingsPanel.IsEnabled = false;
                        ListenButton.Content = " Stop Listening ";
                    }
                    else
                    {
                        listener.StopListening();
                        // UI stuff
                        SettingsPanel.IsEnabled = true;
                        ListenButton.Content = " Start Listening ";
                    }
                }
            }
        }

You don't have to use a wake up command, of course, but if you want to, just call WakeUpOnKeyPhrase passing the phrase. If you want SpeechListener to play a WAV file when it "wakes up" - a nice little extra touch - pass true for the second argument (PlayWaveFileOnWakeUp) and pass the wave file name as the third parameter. If you don't want to play a wav file just pass false and an empty string.

The Phrases property takes a CRLF delimited string of phrases and internally creates a grammar from it. Just set Phrases to the words and phrases you want it to recognize.

Finally, call StartListening().

If you called WakeUpOnKeyPhrase prior to StartListening, nothing happens until you utter the Wake up command, at which point the WakeUp event fires. Now SpeechListener is waiting for you to speak one of the phrases. It will either fire the SpeechRecognized event or nothing at all, after which you'll have to speak the Wake Up command again to repeat the process. This pattern continues until you call StopListening().

If you are not using Wake Up command, the SpeechRecognized event will continue to fire until you call StopListening().

If you want more fine-grained access to the properties and events, just acccess the public RecognitionEngine field, which exposes the SpeechRecognitionEngine object used internally. You can party on all the events if you like.

How it works - Sentence Detection

Before we can determine if a word or phrase is part of a sentence, we have to create a Grammar that allows for wild cards (undefined speech) on either side of the phrase. Here's the code that I use. I create two extra GrammarBuilder objects containing wild cards. One goes before the choices, and another one goes after. This is important, and you'll see why in a minute.

public Grammar CreateGrammar(string[] phrases)
{
    Grammar g;

    // first, put the phrases in a choices object
    var choices = new Choices(phrases);

    // create a grammar builder to prepend our choices
    var beforeBuilder = new GrammarBuilder();
    // append a wildcard (unknown speech)
    beforeBuilder.AppendWildcard();
    // create a semantic key from the builder
    var beforeKey = new SemanticResultKey("beforeKey", beforeBuilder);

    // do the same three steps to create a "wild card" to follow our choices
    var afterBuilder = new GrammarBuilder();
    afterBuilder.AppendWildcard();
    var afterKey = new SemanticResultKey("afterKey", afterBuilder);

    // create the main grammar builder
    var builder = new GrammarBuilder();
    builder.Culture = RecognitionEngine.RecognizerInfo.Culture;
    builder.Append(beforeBuilder);
    builder.Append(choices);
    builder.Append(afterBuilder);

    // create a new grammar from the final builder
    return new Grammar(builder);
}

The function IsPartOfSentence determines if a RecognitionResult is part of a sentence by checking the Words collection. The word "..." denotes a wild card (undefined or unknown speech). So, if the word "..." is in the Words collection, we can safely ignore it because it was spoken in the context of a bigger phrase.

public bool IsPartOfSentence(RecognitionResult result)
{
    foreach (var word in result.Words)
    {
	if (word.Text == "...")
	    return true;
    }
    return false;
}

The rest of the code is fairly straight ahead, except for one thing that drives me nuts about Speech Recognition. Typically, if you want to interact using Speech, that means you will say something, the PC will respond (typically with the Speech.Synthesis.SpeechSynthesizer) and then you want it to start listening again for more commands or phrases which may be different from the last ones depending on what you want to do.

But here's the thing. When you recognize speech asynchronously you handle an event. If you want to change up what you're doing, you have to get out of this thread to let the calling code complete. Fact of life, but still a PITA. So, to get around this I implement an old favorite pattern of starting a 1 millisecond timer just one time. It's quick and easy and it works without ceremony. To get access to the SpeechRecognizedEventArgs parameter, I just stuff it in the timer object's tag.

System.Windows.Threading.DispatcherTimer GetOutOfThisMethodTimer;

public SpeechListener()
{
    // other initialization code here
    GetOutOfThisMethodTimer = new System.Windows.Threading.DispatcherTimer();
    GetOutOfThisMethodTimer.Interval = TimeSpan.FromMilliseconds(1);
    GetOutOfThisMethodTimer.Tick += GetOutOfThisMethodTimer_Tick;
}

void SpeechRecognitionEngine_SpeechRecognized(object sender, 
   System.Speech.Recognition.SpeechRecognizedEventArgs e)
{
    GetOutOfThisMethodTimer.Tag = e;
    GetOutOfThisMethodTimer.Start();
}

void GetOutOfThisMethodTimer_Tick(object sender, EventArgs e)
{
    GetOutOfThisMethodTimer.Stop();
    var obj = GetOutOfThisMethodTimer.Tag;
    GetOutOfThisMethodTimer.Tag = null;

    if (obj == null)
    {
        StartListening();
        return;
    }

    var args = (SpeechRecognizedEventArgs)obj;
    
}

Download the code here and enjoy!

- Carl

Comments Off

View Printer Friendly Version

Email Article to Friend

References (30)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Response: оптимизация сайта

at оптимизация сайта on May 17, 2014

The greatest information on {seo|продвижение сайта|ceo|
Response: chiroractic

at chiroractic on June 6, 2014

Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
Response: dvnf wiki

at dvnf wiki on June 14, 2014

Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
Response: SEO London Expert Services

at SEO London Expert Services on June 29, 2014

Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
Response: xovilichter

at xovilichter on July 11, 2014

Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
Response: xovilichter

at xovilichter on July 11, 2014

Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
Response: wholesale womens lingerie

at wholesale womens lingerie on July 21, 2014

Carl Franklin - Intellectual Hedonism - Blog - Simplifying Speech Recognition with .NET
Response: mobile application design

at mobile application design on August 30, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: asus system

at asus system on August 30, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: senior pastor jobs

at senior pastor jobs on September 8, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: mouse click the up coming document

at mouse click the up coming document on September 10, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: Check Dit

at Check Dit on September 19, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: www.zillow.com

at www.zillow.com on September 21, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: online consumer reviews

at online consumer reviews on September 26, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: online consumer reviews

at online consumer reviews on September 26, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: Visit Homepage

at Visit Homepage on October 9, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: Where to buy Forskolin Fuel,

at Where to buy Forskolin Fuel, on October 9, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: best roofing company

at best roofing company on October 10, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: clip in hair extensions

at clip in hair extensions on October 10, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: free makeup samples

at free makeup samples on October 19, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: Dr Rashmi Patel dental license suspended

at Dr Rashmi Patel dental license suspended on October 21, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: Carson City Restaurants

at Carson City Restaurants on October 25, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: deluxe mah jong

at deluxe mah jong on November 13, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: click

at click on November 15, 2014

Carl Franklin - Blog - Simplifying Speech Recognition with .NET
Response: bajrangi bhaijaan star cast

by Bajrangi Bhaijaan at Bajrangi Bhaijaan on June 4, 2015
Response: new year 2016

by happy new year 2016 at happy new year 2016 on July 26, 2015

rszrt
Response: marmorarias em curitiba

by fachini at technistone on February 5, 2016
Response: marmoraria curitiba

by João at Gomes Mármores e Granitos on March 17, 2016
Response: Frenh Open tennis 2016 live Streaming

by French at French open 2016 on April 12, 2016

Roland Garros 2016
Response: Tour de France 2016 Direct

by James at Tour de France 2016 on June 26, 2016

TDF 2016 live

Reader Comments (3)

Brilliant - if only the Kinect team had used this I would be so much happier with it. I can't tell you the number of times I have been watching Netflix, Hulu, etc. and the show has stopped because I had kinect plugged in and the dialog had the word stop (or even something similar) in it.

January 22, 2014 |

Theron

This article save me a lot of time

November 3, 2014 |

Auto

You're awesome! After days of research, I have finally found exactly what I wanted: an accurate speech recognition library that works offline and only looks for what I wanted. Other examples based on .Net's speech.recognition suck. Even if i say like sadfasdfsagfd they all came up with one of the commands on list, but this returns only when I really say them. Thanks a lot!

July 25, 2015 |

Muhsin Fatih

Comments for this entry have been disabled. Additional comments may not be added to this entry at this time.