Speech vendors shout for standards.html
poehlman1 at home.com
Mon Feb 4 10:21:54 EST 2002
this might be easier to read:
Speech vendors shout for standards
February 1, 2002 5:41 pm PT
THE BATTLE FOR
standards is set to escalate next week when a collection of industry
leaders submits to the World Wide Web Consortium (W3C) a proposed
framework for delivering
combined graphics and speech on handheld devices.
The VoiceXML Forum, headed by IBM, Nuance, Oracle, and Lucent will
announce a proposal for a multimodal technology standard at the
Telephony Voice User
Interface Conference, in Scottsdale, Ariz.
Meanwhile, Microsoft will counter with its own news, using the same
conference to announce the addition of another major speech vendor to
its SALT (Speech
Application Language Tags) Forum. The as yet unnamed vendor intends to
rewrite its components to work with Microsoft's speech platform.
The announcement will follow the addition of 18 new members to the SALT
Forum, a proposed alternative to VXML's multimodal solution.
New members of the SALT Forum include Compaq and Siemens Enterprise
Networks. Founding members include Cisco, Comverse, Intel, Microsoft,
Philips, and SpeechWorks.
Beyond the issue over which industry-backed consortium has the best
multimodal solution, larger issues are at play according industry
observers, some of
whom preferred not to go on record for fear antagonizing Microsoft, a
"The Microsoft partnership announcement is about a major
company redoing their technology for Microsoft's .Net strategy," said
one industry insider who asked not to be attributed.
Microsoft's speech platform will encourage developers to be .Net
compliant by tightly connecting its SAPI (speech application
interface) Version 5.1, with SALT and Visual Studio. Currently, only
Microsoft's own speech engine is SAPI 5.1-compliant. Ultimately
Microsoft appears to
be making speech a greater part of its .Net plans.
"The SALT component in Visual Studio is in progress. The alpha is out,
beta by middle of the year," said James Mastan, group product manager at
for its .Net Speech Technologies in Redmond, Wash.
As Microsoft upgrades SAPI to be SALT-compliant, independent software
developers wanting to create applications for the Microsoft platform
will also have
be SAPI 5.1- and SALT-compliant as well.
However, SALT is still in its early stages. The first proposed
specification is not expected until later this year. Most mainstream
speech developers are
currently creating Voice XML speech applications built on Java and the
J2EE (Java 2 Enterprise Edition) environment, and running on BEA, IBM,
This week General Magic and InterVoice-Brite announced a partnership to
develop Interactive Voice Recognition (IVR) enterprise solutions for
using General Magic's VXML technology.
"There is a whole infrastructure being created on J2EE and IBM's
WebSphere, BEA, open source J2EE Web
like Jboss/Tomcat, and on Solaris," said Bill Meisel, president of TMA
Associates in Tarzana, Calif.
Using J2EE Web
developers can deliver what is called "VXML dynamically."
The power of dynamic VXML on a J2EE platform its ability to access a
using voice control and deliver responses customized to the specific
For example, if a sales person called and used voice to access customer
files, that event would trigger access to customers only in that
A business customer ordering office supplies might be identified by the
phone number and a set of customized voice prompts and answers would be
based on past orders.
Until recently Microsoft offered only a simple set of SAPI (speech
APIs). Now through acquisition and internal development it has its own
engine which it is giving away to developers royalty free, said Peter
Mcgregor, an independent software vendor creating speech products.
Microsoft redeveloped SAPI in Version 5.1 to run on its new speech
engine, while simultaneously proposing SALT as an alternative to VXML.
Wrapping it all
up in a marketing context, Microsoft's Mastan called the company's
a "platform," a term previously not used.
He indicated the next step may be to offer Microsoft's speech platform
as part of its application
"We haven't decided on our go to market configuration but you can think
of it as a
that we would sell like any other .Net component. It's a
with a bundled set of components made available to build applications on
top of these components," Mastan said.
He would not say if Microsoft's speech
would be bundled free into its application
in the same way as its Mobile Information
However, becoming Microsoft speech platform-compliant may be a small
price to pay, according to one independent software vendor because of
what he gets
"I don't have to pay royalties for the Microsoft engine which can save
me as much as $6 to $7 per package in fees," said Mcgregor, a speech
using the Microsoft platform for development.
"Microsoft finally has a good engine. As good as Dragon and IBM,"
Meanwhile, the competing speech engines from the likes of IBM and Nuance
are not Microsoft SAPI 5.1-compliant. If you want the free engine you
need to be
compliant with 5.1 and SALT, notes the developer.
"Everybody has to play catch up with Microsoft," Mcgregor said.
The issue over which specification of SALT, not due to be released until
sometime later this year, or VXML, whose Version 2 is now out for
review, is better
is an argument that can only be determined by developers. Each side
claims the other's specifications are deficient.
Microsoft's X.D. Huang, general manager, Microsoft .Net Speech
Technologies said that no matter what the VXML Forum claims, VXML will
never be a good platform
for multimodal speech.
"VXML is just not technically good enough and it doesn't matter what you
do. You can beat a dead horse for a long time but no matter how you beat
still dead," Huang said.
IBM's William S "Ozzie" Osborne, general manager of IBM Voice Systems in
Somers, N.Y., has a different point of view.
"I hope that we get to one standard. Multiple standards fragment the
market place and create a diversion. I would like to see us get to a
is industry wide and not proprietary. What we are proposing to the W3C,
using VXML for speech and x-HTML for graphics in a single program, is
easier than SALT without having to have the industry redo everything
they have done," Osborne said.
However, according to conference organizer, TMA Associates' Miesel, the
debate could be overstated.
More information about the Speakup