Speech Recognition Guide
How to optimize speech in DRE, or troubleshoot any recognition issue
No microphone signal
- Verify that your microphone is plugged into the audio device
- Make sure the audio device is plugged into the PC and turned on
- In DRE, go to Sound -> Input and verify the Input is correct
- If it is, try switching to another device and back again
Yellow Input Meter
If the Input Level Meter is yellow, you either have selected Push To Talk (PTT) or Hotword in the Sound -> Input -> Input Mode
- Make sure to set a key or controller button bind to PTT. Map it from Sound -> Input -> Push To Talk (when Input Mode is Push To Talk)
- Push to PTT key or button and see the input color changing to purple:
- The Input Meter should already animate when you speak into the microphone but it’ll stay yellow until you say the hotword. By default this is DRE or Ok, DRE
- You can set your own hotwords in Sound -> Input -> Input Mode -> Hotwords
Missing speech recognizer
Train your speech profile
It’s important to train the Windows speech profile.
It’s not just about improving speech. Having no trained profile may prevent DRE from hearing you at all.
The Signal-to-Noise ratio (SNR) is the difference between speech volume and background noise levels.
SNR = Signal – Noise
When the noise in your room masks your speech from being too loud, you have a bad SNR.
Good SNR is above 15dB
Check your current Signal-To-Noise Ratio
- Sound -> Recognition -> Speech Recognition Performance Graph
- Speak a couple of commands like Hello DRE, What time is it
- Hover over some of the colored dots
- Inspect the Signal to Noise value in the upper left purple window
Fix a low Signal-To-Noise Ratio
- Remove noise from your environment if possible, or rotate the mic away from the noise sources
- Position the mic closer to your mouth
- Rotate the mic so it’s not picking up your breath
Simultaneously to DRE speech recognizer guessing what you say, it’ll report back how certain it was by producing a confidence value, let’s say 76%.
Minimum Confidence levels in DRE enter the scene to try to restrict how low a confidence a recognition can be before it is rejected. Setting a confidence level of 70% in our example will reject the recognition as it’s 6% below the required.
Shorter sentences often require a higher minimum confidence level, because short noise may be picked up as speech. The longer your input sentence is the lower confidence is required as the recognizer has more data to work with. This is why you will see two sliders to match the Minimum Confidence (Short) and (Long)
Use the PTT Offset to adjust all minimum confidence needed when speaking using Push To Talk. When using PTT, DRE is certain you are actually speaking, so a lower confidence level can be enough to validate speech.
Use the Dynamic Grammar Offset to increase the confidence level needed for any commands that have a dynamic tag in the phrase. This can be a driver name, car number, current position, etc.. Since driver names may be semi-obscure from time to time, adding a bit extra required Minimum Confidence level here makes sense.
Check your confidence
- See the Sound -> Recognition -> Speech Recognition Performance Graph
- Inspect the colored dots in the graph and especially their horizontal positioning in relation to the green squared area, which indicates the area of recognized speech
If you find correct speech outside the green area, by hovering over the colored dots, notice if these are left or right of the green area:
Left of the green area
The phrases were rejected by low confidence
Lower the Minimum Confidence sliders to include future speeches like this one
Right of the green area
The phrases were approved but Confidence compared to its Minimum Confidence level was too high
Increase the Minimum Confidence sliders to keep including future speeches like this one, but also make room for rejected other ones
Calibrate with the Wizard
Using the Get Started Wizard in
Settings -> General -> Wizard
can help set the Minimum Confidence values required for your environment.
Sometimes noise is picked up as speech by the speech recognizer. This is called a false positive (FP) because it was falsely approved.
To counter this from happening, one strategy is to reject all recognitions with slow pacing.
In one situation a slow-paced FP occurred from the command phrase go back.
The input phrase took 1.8 seconds and had an average character duration of about 300ms. Obviously, no one would speak this sentence this slow in reality, so DRE rejects any speech slower than about 150ms per character
Check your Character Durations
The green area represents the approved speech, so we want all phrases you did say to fall inside this square.
If some colored dots you did speak are above the green area, try increasing the Maximum Character Duration sliders, so future speech like those is covered by the green area.
Vice versa, if all your speech is at the lower part of the green area, or if false positives get included in the green area, lower the Maximum Character Duration sliders
Adjust your Maximum Character Duration
Drag the two sliders in Sound -> Recognition -> Character Duration to adjust the limit to just above your normal pace.
Experiment with the levels and make sure to keep (Short) higher than (Long), as short phrases normally are slower-paced than longer ones.
Also, try the Settings -> General -> Wizard to auto-adjust to your pace