Superior Voice Recognition: Essential to Best-in-Class
Voice Solutions
by Kevin Sperry, Business Consultant, CTGTALK Voice Solutions
For voice solutions that give workers task direction and accept
their spoken words as input, recognition quality is everything. The
challenges for any speech recognition system include:
- The variety of ways the same word or phrase is said by
different users
- Changes in the user's voice from day to day, for example if
they have a cold or raspy voice
- Changes in the user's voice throughout the day, from
end-of-the-day fatigue or after lifting a heavy item
- Loud background noise from fans, loud speakers, radios,
conveyors, sirens, etc.
- Changes in background noise from one area of the workplace to
another
- Acoustic diversity of industrial and manufacturing
environments
Much has been said about the merits of 'speaker-dependent' vs.
'speaker-independent' solutions in the market. Problems that both
types of solutions have to face:
- 'Inserts' caused by pallet drops, gate squeaks, or PA system
announcements
- Cross-talk problems that pick up voices of users standing next
to each other
- Reduced recognition accuracy around very noisy areas like fans,
belts, and industrial equipment
Speaker-dependent systems typically either ignore the inserts or
attempt to address the issue by retraining the voice templates.
These systems rely heavily on noise-cancelling microphones, which
perform inadequately when noise comes from all directions
(especially from behind the user). With this type of system, the
burden is on the user to train the templates, retrain words, and
test for background noise changes. All these non-productive tasks
affect productivity. And templates for each user must be stored and
managed, adding to the overhead and cost of the solution.
With speaker-independent systems, users are often required to
contact the vendor to do special tweaks to adapt to the environment
or the speaker, or just accept poor recognition quality. Whether
speaker-dependent or -independent, both types of system can require
constant manual adjustments that are time consuming, frustrating
for users, and reduce the return on investment (ROI) of the voice
system.
Choosing the Best Solution for your
Business
The reality is that for any voice solution to provide excellent
recognition it requires special processing techniques that enable
it to automatically adapt to changes in environment and the user's
voice. The solution has to both remove noise and clean up the
speech signal so the recognition engine will hear pure speech from
the user. Unfortunately, most systems in the market today--both
speaker dependent and independent--completely lack the ability to
adapt.
As your company looks into selecting the best voice solution to
meet your business needs, you need to ensure that the system can
adapt to your particular surroundings. While both speaker-dependent
and -independent systems have advantages and disadvantages, the
real issue is the ability of the system to recognize the voice of
the user who is speaking. A successful system uses environment
remodeling to automatically detect and adapt to constantly changing
environments and user word pronunciations. The voice solution you
select should be able to handle industrial noise from any direction
with near-perfect recognition accuracy. And it should be proven and
already in use by a large number of companies in many diverse
environments.
Most important, the system needs to work well for your
business.