Mixing scripts (Re: Unicode versions (Re: Criteria forexceptional characters))

Mon Dec 25 05:46:59 CET 2006

At 04:18 06/12/25, Michael Everson wrote:
>At 14:06 -0500 2006-12-24, John C Klensin wrote:
>
>>This would seem reasonable, except for the number of times we have been told that block structure, and the ordering of characters within a block, have nothing to do with collation sequences.

Yes, indeed. But immagine having only one code for the number 0
and uppercase O, and trying lexicographic sort where you have
all the numbers before all the letters. Whatever the codepoints
are, it just won't work.

>No, John, I am not talking about binary sorting. I'm talking about one letter that would have to be in two places at the same time.
>
>Consider a list of personal names in Kurdish. Some in Latin, some in Cyrillic. You want to sort them. It's a single language. There is only one W available. How will you sort the names beginning with W? They will all interfile, Latin and Cyrillic, at Latin W.

I consider this a bad example. There is quite some chance that
people would prefer a mixed list, so that they don't have to
look up a name in two places if they don't know if it's written
in Cyrillic or Latin.

But more fundamentally, if there are one or two 'foreign script'
letters in context (i.e. you don't have any Kurds named "Mr. W",
the actual sorting is still possible: Just check all the Ws, and
the characters around them, and change the Ws surrounded by
Cyrillic characters to some internally assigned "Cyrillic W" code.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp