The Myths about Speaker Dependent Voice
Solutions
by Kevin Sperry, Business Consultant, CTGTALK Voice Solutions
For voice solutions that
give workers task direction and accept their spoken commands as
input, recognition quality is everything. The challenges for any
speech recognition system are significant
including:
- The variety of ways the same word or phrase is said by
different users
- Changes in the user's voice within a day or from day to
day
- Loud background noise from fans, loud speakers, radios,
conveyors, forklifts, sirens, etc.
- Changes in that background noise from one area of the facility
to another
- Acoustic diversity of industrial and manufacturing
environments.
What You Will Learn in This Report
This report addresses some common misconceptions of speaker
dependent systems. It reveals five myths about speaker dependent
solutions you should be aware of before you purchase a voice
solution. It will help guide your evaluation and selection
processes in a way that best fits your business, your customers,
and your long-term success and competitive advantage.
Myth 1: Speaker dependent solutions are the only way to achieve
excellent speech recognition.
Fact: Speaker independent solutions, when
implemented correctly, provide excellent voice recognition in noisy
industrial environments and have significant advantages over their
speaker dependent counterparts including recognition quality and
improved productivity.
- It was not so long ago that speaker independent solutions were
poor at recognizing users in industrial environments. However,
through a combination of faster processors, improved independent
speech recognition engines, better microphones, and more
sophisticated software noise elimination techniques, speaker
independent systems now provide excellent speaker recognition. In
fact, the best-of-breed speaker independent systems have improved
to the point where speakers are recognized approximately 99.6% of
the time. This advantage gives workers the immediate ease of use
that comes with speaker independent systems enabling them to focus
on their tasks without worrying if the technology is working.
- Speaker dependent systems are beneficial for the small percent
of the users who do have unique ways of saying a word or command.
An independent speaker system enables you to address this issue by
training specific words for this small group instead of making the
entire user group train their words.
- Not all speaker independent solutions can handle the variety of
speakers and background noises. Many still have yet to achieve the
right combination of factors to provide consistent, excellent
speaker recognition. Do your research when investigating speaker
independent solutions. Before you buy, talk to users of their
system.
Myth 2: The reason new entries into the voice enablement market
use speaker independent engines is because they are inexpensive and
easier to develop.
Fact: Yes, speaker independent systems are
easier to develop and cost the user less. That is a good thing for
developers and users. Why build a voice solution around an old
technology when the new technology is better?
- In reality as companies have moved into the speech market they
realize there is no need to develop the basic voice engine--they
exist from several well-established providers. By using a
commercially available speech engine, a company can focus on the
issues that adapt that engine to their customer's needs. As
advances in recognition technology are made, the speech engine
manufacturer is equipped to implement those advances quickly so
that existing clients can quickly move the enhancements to the
field.
- No speech provider has developed a speaker dependent engine in
the last six to 10 years. The existing speaker independent engines
are just that good.
Myth 3: Training speaker dependent systems is trivial and easy
to do.
Fact: Each user of a speaker dependent solution
has to train the speaker dependent solution to their voice. Speaker
dependent providers downplay this training. It may take anywhere
from 20 to 40 minutes to train their systems to a specific user's
voice. Also, because users are new to the system they may train the
system differently than the way they speak when they are actually
performing their tasks. This causes recognition errors and can
frequently cause a user to train the system a second, third, or
fourth time.
- As the user's voice changes from day to day and from first
thing in the shift to late in the shift, the user may have to
retrain words in their voice template again. Any training or
retraining of words is lost productivity.
Myth 4: Speaker dependent solutions allow you to provide any
responses you want and still achieve good recognition.
Fact: When user responses get too short, they
can cause what are called "inserts." These are times when the
speaker dependent engine returns what it thought was an appropriate
response but in fact it heard a horn beep or a gate squeak or a PA
system.
- Frequently users need to train a number as "fiver" instead of
five or "niner" instead of nine to get adequate recognition.
- Speaker independent systems can suffer from the same behavior.
That is why you need to pick a speech provider that has a solution
built and tested to work in noisy and frequently changing
environments. What works in a conference room may not work next to
a noisy conveyor.
- The system should be able to adapt automatically without making
any configuration changes or retraining any users when the
background sound changes significantly.
Myth 5: Once a speaker dependent solution is trained, it
remains trained.
Fact: A user's voice is not always the same.
The voice can become tired at the end of a hard day and cause
recognition issues for a speaker dependent solution. A person that
has a cold sounds very different from that same person without the
cold. The voice of a person who comes to work a day after coaching
their son's little league game can be significantly different from
their normal voice. All of these can cause problems for a speaker
dependent solution.
- Speaker dependent solutions have also been known to have to be
re-trained when a user starts working in a different part of the
warehouse with very different background noise. For example,
moving from floor pick where there is little noise to pick-to-belt
where there is a noisy conveyor.
- If an issue arises with improper training of a speaker
dependent solution, these challenges are addressed by either
ignoring the 'inserts' or retraining voice templates in hope that
it addresses the issue. These solutions rely heavily on noise
cancelling microphones, a woefully inadequate "fix" when noise
comes from all directions (especially from behind the user). The
burden is on the user to train templates, retrain words, and test
for background noise changes. These retraining and trial and error
remediation tasks can significantly reduce productivity.
- With speaker dependent solutions, templates for each user must
be stored and managed, adding to the overhead and cost of the
solution. They require IT resources, a server, and user time. These
activities and requirements all take away from the return on
investment (ROI) expected from a voice solution.
Moving from the Myths to the Facts
- Speaker dependent solutions used to be the only way to go for
industrial environments. That has changed.
- Speaker independent solutions are now providing speaker
recognition accuracy that has set a new bar in the industry without
the problems inherent in speaker dependent solutions. There is now
at least one speaker dependent provider that provides a speaker
independent solution as well.
- Today's best-of-breed speaker independent solutions offer the
least hassle for your users, the best voice recognition in noisy or
changing environments, and the best ROI in voice productivity.