Over the many years I've been using .NET's Speech.Recognition features, I've learned
a few things. Now I've encapsulated all that goodness into one class, SpeechTools.SpeechListener
A big problem with Speech Recognition is false positives. You only want the computer
to interpret your speech when you are speaking to it. How does it know that you're
talking to your friend rather than issuing a command?
The answer comes from our friend Gene Roddenberry, creator of Star Trek. Any time
Kirk wanted to talk to the computer he'd first say the word "Computer." That little
prompt is enough to "wake-up" the computer and listen for the next phrase. To let
Kirk know that it was listening, the computer would make a bleepy noise. We can do the
same.
Another thing we can do is determine whether the word or phrase was spoken by itself
or as part of a larger phrase or sentence. You only want the computer to respond to
a command when the command is spoken by itself. If you speak the command or phrase as
part of longer sentence it should be ignored.
Above all, the code for speech recognition should be much easier than it is. If I just
want to recognize a set of phrases or commands, it shouldn't require hours of learning
about grammars and builders and all that jazz.
SpeechListener simplifies all of that. Take a look at this demo app window:
The app is ready to test without modification. Just press the "Start Listening" button.
By default, our wake up command is the word "Computer," but it can be anything you like.
Say the wake up command by itself. SpeechTools plays the Star Trek Computer wake up wave
file (provided) and fires a WakeUp event for your convenience.
At this point it is listening for any of the given phrases. Say "Is it lunch time yet?"
and a Recognized event will fire, passing the actual SpeechRecognizedEventArgs object from
the API. To recognize another word, repeat the whole process, starting with the wake up command.
Now, check out the code. First the XAML:
<Windowx:Class="SpeechToolsDemo.MainWindow"xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"Title="Speech Tools Demo"Height="350"Width="525"><StackPanel><StackPanelx:Name="SettingsPanel"HorizontalAlignment="Left"><CheckBoxx:Name="ListenForWakeUpCommandCheckBox"Margin="10,10,0,0"FontSize="14"IsChecked="True">Listen for Wake-up Command</CheckBox><StackPanelMargin="10,10,0,0"Orientation="Horizontal"IsEnabled="{Binding ElementName=ListenForWakeUpCommandCheckBox,
Path=IsChecked}"><TextBlockFontSize="14"Text="Wake-up Command:"/><TextBoxx:Name="WakeUpCommandTextBox"Margin="10,0,0,0"Width="200"FontSize="14"Text="Computer"/></StackPanel><TextBlockMargin="10,20,0,0"FontSize="14"Text="Enter words or phrases to recognize, one per each line:"/><TextBoxx:Name="PhrasesTextBox"Margin="10,10,0,0"FontSize="14"Width="450"Height="130"VerticalScrollBarVisibility="Visible"HorizontalScrollBarVisibility="Visible"TextWrapping="NoWrap"SpellCheck.IsEnabled="True"AcceptsReturn="True"/></StackPanel><Buttonx:Name="ListenButton"HorizontalAlignment="Left"Margin="10,10,0,0"FontSize="14"Width="100"Content=" Start Listening "/><TextBlockx:Name="HeardTextBlock"Margin="10,10,0,0"FontSize="16"/></StackPanel></Window>
Fairly straight ahead here. Now for the wonderful part. The Code:
using System;
using System.Windows;
namespace SpeechToolsDemo
{
/// <summary>/// Interaction logic for MainWindow.xaml/// </summary>publicpartialclass MainWindow : Window
{
SpeechTools.SpeechListener listener = null;
// set properties: Build Action = None,
// Copy to Output Directory = Copy Alwaysstring WakeUpWavFile = "computer.wav";
public MainWindow()
{
InitializeComponent();
listener = new SpeechTools.SpeechListener();
listener.SpeechRecognized += listener_SpeechRecognized;
listener.WakeUp += listener_WakeUp;
// seed the Phrases. You can change them, of course!this.PhrasesTextBox.Text = "This is cool\n" +
"Is it lunch time yet?\n" +
"Let's Party";
this.ListenButton.Click += ListenButton_Click;
}
void listener_WakeUp(object sender,
System.Speech.Recognition.SpeechRecognizedEventArgs e)
{
// This event fires when you speak the wake-up command
}
void listener_SpeechRecognized(object sender,
System.Speech.Recognition.SpeechRecognizedEventArgs e)
{
// Fires when a phrase is recognized
HeardTextBlock.Text = DateTime.Now.ToLongTimeString() + ": " + e.Result.Text;
}
void ListenButton_Click(object sender, RoutedEventArgs e)
{
if (ListenButton.Content.ToString() == " Start Listening ")
{
// use a wake up command for added accuracyif (ListenForWakeUpCommandCheckBox.IsChecked == true)
listener.WakeUpOnKeyPhrase(WakeUpCommandTextBox.Text,
true, WakeUpWavFile);
// set the phrases to listen for and start listening
listener.Phrases = PhrasesTextBox.Text;
listener.StartListening();
// UI stuff
SettingsPanel.IsEnabled = false;
ListenButton.Content = " Stop Listening ";
}
else
{
listener.StopListening();
// UI stuff
SettingsPanel.IsEnabled = true;
ListenButton.Content = " Start Listening ";
}
}
}
}
You don't have to use a wake up command, of course, but if you want to, just call WakeUpOnKeyPhrase
passing the phrase. If you want SpeechListener to play a WAV file when it "wakes up" - a nice little
extra touch - pass true for the second argument (PlayWaveFileOnWakeUp) and pass the wave file name
as the third parameter. If you don't want to play a wav file just pass false and an empty string.
The Phrases property takes a CRLF delimited string of phrases and internally creates a grammar from it.
Just set Phrases to the words and phrases you want it to recognize.
Finally, call StartListening().
If you called WakeUpOnKeyPhrase prior to StartListening, nothing happens until you utter the Wake
up command, at which point the WakeUp event fires. Now SpeechListener is waiting for you to speak one
of the phrases. It will either fire the SpeechRecognized event or nothing at all, after which you'll
have to speak the Wake Up command again to repeat the process. This pattern continues until you call
StopListening().
If you are not using Wake Up command, the SpeechRecognized event will continue to fire until you call
StopListening().
If you want more fine-grained access to the properties and events, just acccess the public
RecognitionEngine field, which exposes the SpeechRecognitionEngine object used internally.
You can party on all the events if you like.
How it works - Sentence Detection
Before we can determine if a word or phrase is part of a sentence, we have to create a
Grammar that allows for wild cards (undefined speech) on either side of the phrase.
Here's the code that I use. I create two extra GrammarBuilder objects containing wild
cards. One goes before the choices, and another one goes after. This is important, and
you'll see why in a minute.
public Grammar CreateGrammar(string[] phrases)
{
Grammar g;
// first, put the phrases in a choices objectvar choices = new Choices(phrases);
// create a grammar builder to prepend our choicesvar beforeBuilder = new GrammarBuilder();
// append a wildcard (unknown speech)
beforeBuilder.AppendWildcard();
// create a semantic key from the buildervar beforeKey = new SemanticResultKey("beforeKey", beforeBuilder);
// do the same three steps to create a "wild card" to follow our choicesvar afterBuilder = new GrammarBuilder();
afterBuilder.AppendWildcard();
var afterKey = new SemanticResultKey("afterKey", afterBuilder);
// create the main grammar buildervar builder = new GrammarBuilder();
builder.Culture = RecognitionEngine.RecognizerInfo.Culture;
builder.Append(beforeBuilder);
builder.Append(choices);
builder.Append(afterBuilder);
// create a new grammar from the final builderreturnnew Grammar(builder);
}
The function IsPartOfSentence determines if a RecognitionResult is part of a sentence
by checking the Words collection. The word "..." denotes a wild card (undefined or unknown speech). So, if the word "..." is in the Words collection, we can safely ignore it because it was spoken in the context of a bigger phrase.
publicbool IsPartOfSentence(RecognitionResult result)
{
foreach (var word in result.Words)
{
if (word.Text == "...")
returntrue;
}
returnfalse;
}
The rest of the code is fairly straight ahead, except for one thing that drives me nuts about
Speech Recognition. Typically, if you want to interact using Speech, that means you will say
something, the PC will respond (typically with the Speech.Synthesis.SpeechSynthesizer) and then you want it to start listening again for more commands or phrases which may be different
from the last ones depending on what you want to do.
But here's the thing. When you recognize speech asynchronously you handle an event.
If you want to change up what you're doing, you have to get out of this thread to let the
calling code complete. Fact of life, but still a PITA. So, to get around this I implement
an old favorite pattern of starting a 1 millisecond timer just one time. It's quick and easy
and it works without ceremony. To get access to the SpeechRecognizedEventArgs parameter,
I just stuff it in the timer object's tag.
System.Windows.Threading.DispatcherTimer GetOutOfThisMethodTimer;
public SpeechListener()
{
// other initialization code here
GetOutOfThisMethodTimer = new System.Windows.Threading.DispatcherTimer();
GetOutOfThisMethodTimer.Interval = TimeSpan.FromMilliseconds(1);
GetOutOfThisMethodTimer.Tick += GetOutOfThisMethodTimer_Tick;
}
void SpeechRecognitionEngine_SpeechRecognized(object sender,
System.Speech.Recognition.SpeechRecognizedEventArgs e)
{
GetOutOfThisMethodTimer.Tag = e;
GetOutOfThisMethodTimer.Start();
}
void GetOutOfThisMethodTimer_Tick(object sender, EventArgs e)
{
GetOutOfThisMethodTimer.Stop();
var obj = GetOutOfThisMethodTimer.Tag;
GetOutOfThisMethodTimer.Tag = null;
if (obj == null)
{
StartListening();
return;
}
var args = (SpeechRecognizedEventArgs)obj;
}