Wednesday
Jun132012

Automatically Adjust Input Gain During Speech Recognition

When we use speech recognition on our Windows 7 machine, the last thing most of us think about is the record level of our microphone. If it's too low, recognition will not be accurate because the system can't hear you proporly. If the level is too high you will overdrive the amplifier and again, the result is low accuracy.

If you are lucky enough to be a .NET developer you can use a neat little trick to automatically adjust the input gain.

Here's the XAML for a simple window that shows an audio level meter and a slider for manually adjusting the input gain: 

<Window x:Class="AudioGainTestCS.MainWindow"     xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    Title="MainWindow" Height="350" Width="525">
    <Grid Height="273" Width="427">
        <ProgressBar Height="37" HorizontalAlignment="Left" Margin="12,12,0,0" Name="ProgressBar1" VerticalAlignment="Top" Width="403" />
      <Slider Height="26" HorizontalAlignment="Left" Margin="12,55,0,0" Name="Slider1" VerticalAlignment="Top" Width="403" Orientation="Horizontal" Maximum="99" />
        <TextBox Height="175" HorizontalAlignment="Left" FontSize="30" TextWrapping="Wrap" Margin="14,86,0,0" Name="TextBox1" VerticalAlignment="Top" Width="401" />
    </Grid>
</Window>


C# Code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;

namespace AudioGainTestCS
{
    /// <summary>
    /// This application automatically adjusts the record level (input volume)
    /// as you use speech recognition. A mismatched record level can be 
    /// disastrous if you're going to use speech recognition. Note that for
    /// this demo, I'm using a DictationGrammar, but it works with any grammar.
    /// 
    /// The key is the WaveLibMixer.dll, which you can find on CodeProject. 
    /// This library lets you get and set levels and other controls associated with
    /// audio devices.
    /// 
    /// The SpeechRecognitionEngine raises an event on a regular interval passing in
    /// the volume at the input device. When you are not speaking this level is close
    /// to zero. When you are shouting it's closer to 99.
    /// 
    /// Given the volume of speech over time and the ability to control the record level
    /// dynamically, well... it's pretty darned easy to automatically adjust the
    /// microphone for the speaker.
    /// 
    /// Don't get caught without this nice little tool!
    /// 
    /// Carl
    /// </summary>
    public partial class MainWindow : Window
    {
        // -- Add a reference to System.Speech
        private System.Speech.Recognition.SpeechRecognitionEngine speech = new System.Speech.Recognition.SpeechRecognitionEngine();
    
        // -- Add a reference to WaveLibMixer.dll, which you can find at:
        //    http://www.codeproject.com/Articles/11695/Audio-Library-Part-I-Windows-Mixer-Control
        private WaveLib.AudioMixer.MixerLine audioLine;
    
        private Int32 peakLevel;

        public MainWindow()
        {
            InitializeComponent();
            this.Loaded+=new RoutedEventHandler(MainWindow_Loaded);
            speech.AudioLevelUpdated+=new EventHandler<System.Speech.Recognition.AudioLevelUpdatedEventArgs>(speech_AudioLevelUpdated);
            speech.SpeechRecognized+=new EventHandler<System.Speech.Recognition.SpeechRecognizedEventArgs>(speech_SpeechRecognized);
            Slider1.ValueChanged += new RoutedPropertyChangedEventHandler<double>(Slider1_ValueChanged);
        }

   
        private void MainWindow_Loaded(object sender, System.Windows.RoutedEventArgs e) {
            // -- Create a new input mixer object to control an audio input device
            WaveLib.AudioMixer.Mixer audioMixer = new WaveLib.AudioMixer.Mixer(WaveLib.AudioMixer.MixerType.Recording);
            // -- Open the default recording device
            audioMixer.DeviceId = audioMixer.DeviceIdDefault;
            // -- Does the mixer have lines? It ought to...
            if ((audioMixer.Lines.Count > 0)) {
                // -- Select the first line, which is usually the Master Volume
                audioLine = audioMixer.Lines[0];
                // -- Does that line have a Volume control? It really ought to...
                if (audioLine.ContainsVolume) {
                    // -- Set the input volume (record level) to 50%
                    audioLine.Volume = audioLine.VolumeMax / 2;
                    // -- Set up the slider min/max and value based on the volume
                    Slider1.Minimum = audioLine.VolumeMin;
                    Slider1.Maximum = audioLine.VolumeMax;
                    Slider1.Value = audioLine.Volume;
                }
                else {
                    // -- Jeez... lame.
                    Slider1.IsEnabled = false;
                }
            }

            // -- The progress bar is used as an audio meter. 
            //    The speech recognition engine gives us a volume value from 0 to 99.
            //    So we set the ProgressBar min and max accordingly
            ProgressBar1.Minimum = 0;
            ProgressBar1.Maximum = 99;
            // -- Set the speech synthesizer to use the default audio input device
            speech.SetInputToDefaultAudioDevice();
            // -- Tell the speech synthesizer to recognize plain speech, not commands.
            speech.LoadGrammar(new System.Speech.Recognition.DictationGrammar());
            // -- Start recognizing
            speech.RecognizeAsync(System.Speech.Recognition.RecognizeMode.Multiple);
        }
    
        private void speech_AudioLevelUpdated(object sender, System.Speech.Recognition.AudioLevelUpdatedEventArgs e) {
            // -- This is an event handler that happens on a regular interval.
            //    The e.AudioLevel is a value from 0 to 99 representing the 
            //    volume of the talker's voice in almost real-time.
            // -- Setting the progressbar value to this level creates an audio meter.
            ProgressBar1.Value = e.AudioLevel;
            // -- If the volume is over 70% loud, knock the slider down a bit
            //    which will in turn drop the record level
            if ((e.AudioLevel > 70)) {
                Slider1.Value -= 1000;
            }
            // -- Keep track of the peak level for this sentence.
            if ((e.AudioLevel > peakLevel)) {
                peakLevel = e.AudioLevel;
            }
        }
    
        private void speech_SpeechRecognized(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e) {
            // -- This event handler fires after the talker speaks a sentence.
            //    e.Result.Text contains the text that they have spoken.
            // -- In this case, I'm capitalizing the sentence, adding a period at the end, 
            //    and displaying the text in a text box.
            TextBox1.Text = (e.Result.Text.Substring(0, 1).ToUpper() 
                        + (e.Result.Text.Substring(1) + "."));
            if ((peakLevel < 20)) {
                Slider1.Value += 3000;
            }
            // -- Reset the peak level
            peakLevel = 0;
        }

        void Slider1_ValueChanged(object sender, RoutedPropertyChangedEventArgs<double> e)
        {
            // -- When the value of the slider changes, 
            //    set the audio input volume (the record level) to that value.
            //    This can happen when the user moves the slider, or some code
            //    sets the Slider1.Value 
            if (audioLine != null)
            {
                audioLine.Volume = (int)Slider1.Value;
            }
        }    
    }
}

PrintView Printer Friendly Version

EmailEmail Article to Friend

References (46)

References allow you to track sources for this article, as well as articles that were written in response to this article.
« ROAD TRIP JOURNAL #2 | Main | Recovering Gracefully from Loss of Skeletal Tracking (Kinect For Windows) »