Mixing scripts (Re: Unicode versions (Re: Criteria
forexceptional characters))
Michael Everson
everson at evertype.com
Mon Dec 25 11:59:35 CET 2006
At 13:46 +0900 2006-12-25, Martin Duerst wrote:
> >No, John, I am not talking about binary
>sorting. I'm talking about one letter that would
>have to be in two places at the same time.
> >
> >Consider a list of personal names in Kurdish.
>Some in Latin, some in Cyrillic. You want to
>sort them. It's a single language. There is only
>one W available. How will you sort the names
>beginning with W? They will all interfile, Latin
>and Cyrillic, at Latin W.
>
>I consider this a bad example. There is quite
>some chance that people would prefer a mixed
>list, so that they don't have to look up a name
>in two places if they don't know if it's written
>in Cyrillic or Latin.
I don't think you have thought this through.
Russian Kurds consider Aa Ee Oo Öö Schwa Qq Ww to
be Cyrillic letters, and the behaviour they will
get in a monolingual multiscript sort will be
that Latin Aa Ee Oo Öö Schwa Qq Ww will sort as
Latin, Cyrillic Aa Ee Oo Öö Schwa Qq will sort as
Cyrillic, but as there is not a Cyrillic Ww yet
all those names will be interfiled with Latin.
There is NO chance that people would prefer a
mixed list for that letter.
>But more fundamentally, if there are one or two 'foreign script'
>letters in context (i.e. you don't have any Kurds named "Mr. W",
>the actual sorting is still possible: Just check all the Ws, and
>the characters around them, and change the Ws surrounded by
>Cyrillic characters to some internally assigned "Cyrillic W" code.
Well, that's a hack, now, isn't it? I wonder just
how many companies are going to want to build
that kind of thing into their OS.
--
Michael Everson * http://www.evertype.com
More information about the Idna-update
mailing list