Several problems with the unicode support (was [patch 0/3] speakup: support 16bit unicode screen reading)

Tue Mar 14 18:56:56 EDT 2017

Zahari Yurukov, on mer. 15 mars 2017 00:47:53 +0200, wrote:
> > > And now you lose them in English, too.
> > 
> > I don't understand this. Is there perhaps yet another bug that wasn't
> > fixed or reported?
> > 
> No, I mean if you use an English voice, but you don't use direct mode,
> don't you want the unicode characters spoken?

It's not really a "you don't want", but "I don't think we want to
implement that". Unicode is awfully big, I don't think we want to
include the pronunciation of 65536 glyphs in the kernel :)

> It's worth noting that I send that letter just right before I saw you've
> send a patch,

Sure, I understand that :) I was just afraid that some bug was perhaps
being overlooked

> So I don't have other complaints about unicode reading.

Ok, cool :)

> > We'd have to think and code a bit about this. The kernel actually uses
> > ucs-2 encoding, while people will probably rather feed the internal
> > messages as utf-8 strings. But one has to know whether it's utf-8 or
> > some 8bit character set which is being used. That question is actually
> > related to pasting, for which we need to know the same :)
> 
> Well, a byte order mark might be useful here. Or if there's no BOM, may
> be assume UTF-8?

That wouldn't allow people to just run echo "foobar" > /sys/something,
while the kernel does know whether the console is in UTF-8 mode.

> How did you know the ASCII encodings til now?

That's the trick: you just didn't :) Speakup wouldn't care about which
8bit encoding was used, and would just send it to the softsynth. As
long as the characters you write to /sys and what espeakup eats are
encoded the same, there is no issue. But now, we have the in-kernel
ucs-2 encoding, so we have to know.

> > espeak doesn't speak spaces unless strongly being told to do so :)
> > 
> Yes, that works. Thanks.

Good :)

Samuel