eSpeak - number fixes and UTF-8

Jonathan Duddington jsd at clara.co.uk
Thu Apr 27 20:37:01 EDT 2006


In article <20060427234050.GA12289 at taylor.homelinux.net>,
   Lorenzo Taylor <lorenzo at taylor.homelinux.net> wrote:

> There is still a slight oddity with big numbers though.  The number
> 1000000000 is now spoken as 1 thousand and million and, and 1000000
> is spoken as 1 million and. :)

> The same happens if I put in the commas.
That's because the change to make numbers without commas work, was to
simply add the commas before putting it through the translation rules
:-)

> And while writing this message I just found out that 1,000,025 is
> spoken as 1 million and and 25.

That's the same problem as 1000000

> I am thinking that 1 thousand million is probably OK as opposed to 1
> billion,

I could do "billion" if it's necessary, but yes, "thousand million"
seems OK.

> Sorry, try as I might, I haven't quite familiarized myself enough
> with the rules file to try to make these fixes myself.

The number section of the rules is rather complicated.  It would
probably be easier to just write the algorithm in the program code
rather than trying to do it with rules. But perhaps it's correct now?

I've fixed the extra "and" problem now and uploaded:
test-1.09h-linux.zip

If you do:
  speak -X "1,000,025"

it lists all the rules that match for each "word", where "word" in this
case is each group of numbers which is separated by commas.  The number
on the left of each line is the rule's "score". The highest scoring
rule is used.

The change was to add a line:
        ,_)  0 (00+
after the line:
        ,_)  0 (DD        a2nd
in the numbers section ( .group 9 )

The original rule said:etween a comma and two more digits, is
pronounced "and". The '_' matches a "word boundary". [a2] is a weak
variant of the [a] phoneme.

The new rule says: if these two digits are two more zeroes, then don't
say anything.  The "+" is to increase the score of this rule so that it
takes precedence over the original one.  Usually a specific match (i.e.
'0' rather than 'any digit') gets a higher score, but with numbers that
caused some other problems, so a specific number and "any digit" score
the same.





More information about the Speakup mailing list