eSpeak -- features wish list

Fri Apr 14 07:20:47 EDT 2006

Hello Jonathan,

I'm happy people like eSpeak so much and it seems it is a very
good technology. I'm going to add the config script for Speech
Dispatcher to the official distribution in the next release. You
inquired about what features could you at for it to be more usable
for accessibility purposes.

I'm the main developer of Speech Dispatcher, a project that
tries to unify the access of free software accessibility tools
to speech synthesis engines.

Basically, what we want to do right now, is to split Speech
Dispatcher in two parts: message dispatching (prioritization etc.)
and TTS API (access to synthesizers). For that purpose, we
developed a requirements document for the API, which also
more or less defines the capabilities we expect from the
synthesizers. You might want to look at the requirements document
http://lists.freedesktop.org/archives/accessibility/2006-March/000078.html

It is still a draft and there will be some changes to it.
But the sub-part about SSML deals with the synthesis settings
capabilities which the users want or would like to have.

Of course I'm posting the link to this document merely as
a potential guideline for you. This API will be implemented by some
layer above the engine drivers and missing MUST HAVE and SHOULD HAVE
capabilities can still be emulated either in the engine drivers or in
the covering layer.

This API is being worked on by Brailcom (Speech Dispatcher), KDE and
Gnome. In fact, KDE is going to use Speech Dispatcher soon.

The things that would most help currently are:
	1) Be able to return audio data, not play them itself.
	(This would enable us to write a native driver for Dispatcher
	or TTS API which could be a good improvement. Also it would
	instantly solve the audio problems.)
	2) Settings for punctuation and capital letters signalization.
	(See TTS API requirements draft above, section SSML. This
	doesn't mean this functionality needs to be implemented with
	SSML or embedded markup. It can be a static settings to the
	binary (espeak --punctuation="all") ).
	3) Some way of communication other than running the binary
	for each message again (which is more CPU expansive). See for
	example how Flite works with Dispatcher (linking a library) or
	how Festival works (provides a TCP/IP interface).

I hope I didn't scare you very much :) Of course these are wishes
and some of them rather longer-term wishes. I think you have done
a great work!

Thank you,
Hynek Hanke