Inflection in speech synthesis (was: Re: Some issues with the ibmtts output module of speech-dispatcher)

Tue Jun 6 11:19:21 EDT 2006

Willem van der Walt píše v Út 06. 06. 2006 v 15:08 +0200:
> Espeak has the potential to be the gnu software speech engine.

Frankly, it does not. I strongly hope this is no offence to anyone!
eSpeak is very good as a lightweight English synthesizer and perhaps
some other languages will be developed too. It also is an incredible
step forward in combining diphone and formant synthesis. However, it has
no extensibility mechanism, and it is lacking many other essential
functionality to be THE gnu software speech synthesis framework.

I suspect you do not understand Festival if you are comparing
Festival and Espeak. Maybe you are comparing just some your
choice of the voices, not the whole framework.

But don't get me mistaken. I'm happy eSpeak exists and I'm working with
Jonathan on the necessary improvements so that we can get a better
support in Speech Dispatcher and implement some new capabilities.

> Festival is big and unresponsive.

What exactly do you mean by big?

1) It is difficult to install.

In such case, please contact your distribution authors about it. In
Debian, it is currently as easy as any other package.

2) It takes too much space on the harddrive.

IMHO completely irrelevant with today state of hardware.

3) Something other?

Please explain.

What do you mean by unresponsive?

1) Echoing letters is slow.

Please use new versions of speechd-up and speech-dispatcher. Letters
are being cached now and have nothing to do with Festival.

2) Reading sentences, especially longer ones is slow.

Part of it is the time the synthesis takes, including SSML parsing
etc. This is definitely a thing that should be put on the list
and be worked on in the future. At the same time, computers are
quickly getting faster, so this will be less of an issue. The argument
would have to be stronger to convince the design of Festival is broken
in principle.

Another part of it (like 50% of the total time to get audio out of
speakers) is a bug in the communication mechanism proposed by the
Festival authors and used in Dispatcher. We already identified the issue
and are working on it.

With regards,
Hynek Hanke