Friday
Jan102014
Simplifying Speech Recognition with .NET
Carl Franklin | Posted on Friday, January 10, 2014 at 03:20PM
Over the many years I've been using .NET's Speech.Recognition features, I've learned
a few things. Now I've encapsulated all that goodness into one class, SpeechTools.SpeechListener
A big problem with Speech Recognition is false positives. You only want the computer to interpret your speech when you are speaking to it. How does it know that you're talking to your friend rather than issuing a command?
The answer comes from our friend Gene Roddenberry, creator of Star Trek. Any time Kirk wanted to talk to the computer he'd first say the word "Computer." That little prompt is enough to "wake-up" the computer and listen for the next phrase. To let Kirk know that it was listening, the computer would make a bleepy noise. We can do the same.
Another thing we can do is determine whether the word or phrase was spoken by itself or as part of a larger phrase or sentence. You only want the computer to respond to a command when the command is spoken by itself. If you speak the command or phrase as part of longer sentence it should be ignored.
Above all, the code for speech recognition should be much easier than it is. If I just want to recognize a set of phrases or commands, it shouldn't require hours of learning about grammars and builders and all that jazz.
SpeechListener simplifies all of that. Take a look at this demo app window:
The app is ready to test without modification. Just press the "Start Listening" button.
By default, our wake up command is the word "Computer," but it can be anything you like. Say the wake up command by itself. SpeechTools plays the Star Trek Computer wake up wave file (provided) and fires a WakeUp event for your convenience.
At this point it is listening for any of the given phrases. Say "Is it lunch time yet?" and a Recognized event will fire, passing the actual SpeechRecognizedEventArgs object from the API. To recognize another word, repeat the whole process, starting with the wake up command.
Now, check out the code. First the XAML:
You don't have to use a wake up command, of course, but if you want to, just call WakeUpOnKeyPhrase passing the phrase. If you want SpeechListener to play a WAV file when it "wakes up" - a nice little extra touch - pass true for the second argument (PlayWaveFileOnWakeUp) and pass the wave file name as the third parameter. If you don't want to play a wav file just pass false and an empty string.
The Phrases property takes a CRLF delimited string of phrases and internally creates a grammar from it. Just set Phrases to the words and phrases you want it to recognize.
Finally, call StartListening().
If you called WakeUpOnKeyPhrase prior to StartListening, nothing happens until you utter the Wake up command, at which point the WakeUp event fires. Now SpeechListener is waiting for you to speak one of the phrases. It will either fire the SpeechRecognized event or nothing at all, after which you'll have to speak the Wake Up command again to repeat the process. This pattern continues until you call StopListening().
If you are not using Wake Up command, the SpeechRecognized event will continue to fire until you call StopListening().
If you want more fine-grained access to the properties and events, just acccess the public RecognitionEngine field, which exposes the SpeechRecognitionEngine object used internally. You can party on all the events if you like.
How it works - Sentence Detection
Before we can determine if a word or phrase is part of a sentence, we have to create a Grammar that allows for wild cards (undefined speech) on either side of the phrase. Here's the code that I use. I create two extra GrammarBuilder objects containing wild cards. One goes before the choices, and another one goes after. This is important, and you'll see why in a minute.
But here's the thing. When you recognize speech asynchronously you handle an event. If you want to change up what you're doing, you have to get out of this thread to let the calling code complete. Fact of life, but still a PITA. So, to get around this I implement an old favorite pattern of starting a 1 millisecond timer just one time. It's quick and easy and it works without ceremony. To get access to the SpeechRecognizedEventArgs parameter, I just stuff it in the timer object's tag.
Download the code here and enjoy!
- Carl
A big problem with Speech Recognition is false positives. You only want the computer to interpret your speech when you are speaking to it. How does it know that you're talking to your friend rather than issuing a command?
The answer comes from our friend Gene Roddenberry, creator of Star Trek. Any time Kirk wanted to talk to the computer he'd first say the word "Computer." That little prompt is enough to "wake-up" the computer and listen for the next phrase. To let Kirk know that it was listening, the computer would make a bleepy noise. We can do the same.
Another thing we can do is determine whether the word or phrase was spoken by itself or as part of a larger phrase or sentence. You only want the computer to respond to a command when the command is spoken by itself. If you speak the command or phrase as part of longer sentence it should be ignored.
Above all, the code for speech recognition should be much easier than it is. If I just want to recognize a set of phrases or commands, it shouldn't require hours of learning about grammars and builders and all that jazz.
SpeechListener simplifies all of that. Take a look at this demo app window:
The app is ready to test without modification. Just press the "Start Listening" button.
By default, our wake up command is the word "Computer," but it can be anything you like. Say the wake up command by itself. SpeechTools plays the Star Trek Computer wake up wave file (provided) and fires a WakeUp event for your convenience.
At this point it is listening for any of the given phrases. Say "Is it lunch time yet?" and a Recognized event will fire, passing the actual SpeechRecognizedEventArgs object from the API. To recognize another word, repeat the whole process, starting with the wake up command.
Now, check out the code. First the XAML:
<Window x:Class="SpeechToolsDemo.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="Speech Tools Demo" Height="350" Width="525"> <StackPanel> <StackPanel x:Name="SettingsPanel" HorizontalAlignment="Left"> <CheckBox x:Name="ListenForWakeUpCommandCheckBox" Margin="10,10,0,0" FontSize="14" IsChecked="True">Listen for Wake-up Command</CheckBox> <StackPanel Margin="10,10,0,0" Orientation="Horizontal" IsEnabled="{Binding ElementName=ListenForWakeUpCommandCheckBox, Path=IsChecked}" > <TextBlock FontSize="14" Text="Wake-up Command:" /> <TextBox x:Name="WakeUpCommandTextBox" Margin="10,0,0,0" Width="200" FontSize="14" Text="Computer" /> </StackPanel> <TextBlock Margin="10,20,0,0" FontSize="14" Text="Enter words or phrases to recognize, one per each line:" /> <TextBox x:Name="PhrasesTextBox" Margin="10,10,0,0" FontSize="14" Width="450" Height="130" VerticalScrollBarVisibility="Visible" HorizontalScrollBarVisibility="Visible" TextWrapping="NoWrap" SpellCheck.IsEnabled="True" AcceptsReturn="True" /> </StackPanel> <Button x:Name="ListenButton" HorizontalAlignment="Left" Margin="10,10,0,0" FontSize="14" Width="100" Content=" Start Listening " /> <TextBlock x:Name="HeardTextBlock" Margin="10,10,0,0" FontSize="16" /> </StackPanel> </Window>Fairly straight ahead here. Now for the wonderful part. The Code:
using System; using System.Windows; namespace SpeechToolsDemo { /// <summary> /// Interaction logic for MainWindow.xaml /// </summary> public partial class MainWindow : Window { SpeechTools.SpeechListener listener = null; // set properties: Build Action = None, // Copy to Output Directory = Copy Always string WakeUpWavFile = "computer.wav"; public MainWindow() { InitializeComponent(); listener = new SpeechTools.SpeechListener(); listener.SpeechRecognized += listener_SpeechRecognized; listener.WakeUp += listener_WakeUp; // seed the Phrases. You can change them, of course! this.PhrasesTextBox.Text = "This is cool\n" + "Is it lunch time yet?\n" + "Let's Party"; this.ListenButton.Click += ListenButton_Click; } void listener_WakeUp(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e) { // This event fires when you speak the wake-up command } void listener_SpeechRecognized(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e) { // Fires when a phrase is recognized HeardTextBlock.Text = DateTime.Now.ToLongTimeString() + ": " + e.Result.Text; } void ListenButton_Click(object sender, RoutedEventArgs e) { if (ListenButton.Content.ToString() == " Start Listening ") { // use a wake up command for added accuracy if (ListenForWakeUpCommandCheckBox.IsChecked == true) listener.WakeUpOnKeyPhrase(WakeUpCommandTextBox.Text, true, WakeUpWavFile); // set the phrases to listen for and start listening listener.Phrases = PhrasesTextBox.Text; listener.StartListening(); // UI stuff SettingsPanel.IsEnabled = false; ListenButton.Content = " Stop Listening "; } else { listener.StopListening(); // UI stuff SettingsPanel.IsEnabled = true; ListenButton.Content = " Start Listening "; } } } }
You don't have to use a wake up command, of course, but if you want to, just call WakeUpOnKeyPhrase passing the phrase. If you want SpeechListener to play a WAV file when it "wakes up" - a nice little extra touch - pass true for the second argument (PlayWaveFileOnWakeUp) and pass the wave file name as the third parameter. If you don't want to play a wav file just pass false and an empty string.
The Phrases property takes a CRLF delimited string of phrases and internally creates a grammar from it. Just set Phrases to the words and phrases you want it to recognize.
Finally, call StartListening().
If you called WakeUpOnKeyPhrase prior to StartListening, nothing happens until you utter the Wake up command, at which point the WakeUp event fires. Now SpeechListener is waiting for you to speak one of the phrases. It will either fire the SpeechRecognized event or nothing at all, after which you'll have to speak the Wake Up command again to repeat the process. This pattern continues until you call StopListening().
If you are not using Wake Up command, the SpeechRecognized event will continue to fire until you call StopListening().
If you want more fine-grained access to the properties and events, just acccess the public RecognitionEngine field, which exposes the SpeechRecognitionEngine object used internally. You can party on all the events if you like.
How it works - Sentence Detection
Before we can determine if a word or phrase is part of a sentence, we have to create a Grammar that allows for wild cards (undefined speech) on either side of the phrase. Here's the code that I use. I create two extra GrammarBuilder objects containing wild cards. One goes before the choices, and another one goes after. This is important, and you'll see why in a minute.
public Grammar CreateGrammar(string[] phrases) { Grammar g; // first, put the phrases in a choices object var choices = new Choices(phrases); // create a grammar builder to prepend our choices var beforeBuilder = new GrammarBuilder(); // append a wildcard (unknown speech) beforeBuilder.AppendWildcard(); // create a semantic key from the builder var beforeKey = new SemanticResultKey("beforeKey", beforeBuilder); // do the same three steps to create a "wild card" to follow our choices var afterBuilder = new GrammarBuilder(); afterBuilder.AppendWildcard(); var afterKey = new SemanticResultKey("afterKey", afterBuilder); // create the main grammar builder var builder = new GrammarBuilder(); builder.Culture = RecognitionEngine.RecognizerInfo.Culture; builder.Append(beforeBuilder); builder.Append(choices); builder.Append(afterBuilder); // create a new grammar from the final builder return new Grammar(builder); }The function IsPartOfSentence determines if a RecognitionResult is part of a sentence by checking the Words collection. The word "..." denotes a wild card (undefined or unknown speech). So, if the word "..." is in the Words collection, we can safely ignore it because it was spoken in the context of a bigger phrase.
public bool IsPartOfSentence(RecognitionResult result) { foreach (var word in result.Words) { if (word.Text == "...") return true; } return false; }The rest of the code is fairly straight ahead, except for one thing that drives me nuts about Speech Recognition. Typically, if you want to interact using Speech, that means you will say something, the PC will respond (typically with the Speech.Synthesis.SpeechSynthesizer) and then you want it to start listening again for more commands or phrases which may be different from the last ones depending on what you want to do.
But here's the thing. When you recognize speech asynchronously you handle an event. If you want to change up what you're doing, you have to get out of this thread to let the calling code complete. Fact of life, but still a PITA. So, to get around this I implement an old favorite pattern of starting a 1 millisecond timer just one time. It's quick and easy and it works without ceremony. To get access to the SpeechRecognizedEventArgs parameter, I just stuff it in the timer object's tag.
System.Windows.Threading.DispatcherTimer GetOutOfThisMethodTimer; public SpeechListener() { // other initialization code here GetOutOfThisMethodTimer = new System.Windows.Threading.DispatcherTimer(); GetOutOfThisMethodTimer.Interval = TimeSpan.FromMilliseconds(1); GetOutOfThisMethodTimer.Tick += GetOutOfThisMethodTimer_Tick; } void SpeechRecognitionEngine_SpeechRecognized(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e) { GetOutOfThisMethodTimer.Tag = e; GetOutOfThisMethodTimer.Start(); } void GetOutOfThisMethodTimer_Tick(object sender, EventArgs e) { GetOutOfThisMethodTimer.Stop(); var obj = GetOutOfThisMethodTimer.Tag; GetOutOfThisMethodTimer.Tag = null; if (obj == null) { StartListening(); return; } var args = (SpeechRecognizedEventArgs)obj; }
Download the code here and enjoy!
- Carl
Comments Off
Reader Comments (3)
Brilliant - if only the Kinect team had used this I would be so much happier with it. I can't tell you the number of times I have been watching Netflix, Hulu, etc. and the show has stopped because I had kinect plugged in and the dialog had the word stop (or even something similar) in it.
This article save me a lot of time
You're awesome! After days of research, I have finally found exactly what I wanted: an accurate speech recognition library that works offline and only looks for what I wanted. Other examples based on .Net's speech.recognition suck. Even if i say like sadfasdfsagfd they all came up with one of the commands on list, but this returns only when I really say them. Thanks a lot!