Mixing scripts (Re: Unicode versions (Re: Criteria forexceptional characters))

Michael Everson everson at evertype.com
Mon Dec 25 11:59:35 CET 2006


At 13:46 +0900 2006-12-25, Martin Duerst wrote:

>  >No, John, I am not talking about binary 
>sorting. I'm talking about one letter that would 
>have to be in two places at the same time.
>  >
>  >Consider a list of personal names in Kurdish. 
>Some in Latin, some in Cyrillic. You want to 
>sort them. It's a single language. There is only 
>one W available. How will you sort the names 
>beginning with W? They will all interfile, Latin 
>and Cyrillic, at Latin W.
>
>I consider this a bad example. There is quite 
>some chance that people would prefer a mixed 
>list, so that they don't have to look up a name 
>in two places if they don't know if it's written 
>in Cyrillic or Latin.

I don't think you have thought this through. 
Russian Kurds consider Aa Ee Oo Öö Schwa Qq Ww to 
be Cyrillic letters, and the behaviour they will 
get in a monolingual multiscript sort will be 
that Latin Aa Ee Oo Öö Schwa Qq Ww will sort as 
Latin, Cyrillic Aa Ee Oo Öö Schwa Qq will sort as 
Cyrillic, but as there is not a Cyrillic Ww yet 
all those names will be interfiled with Latin. 
There is NO chance that people would prefer a 
mixed list for that letter.

>But more fundamentally, if there are one or two 'foreign script'
>letters in context (i.e. you don't have any Kurds named "Mr. W",
>the actual sorting is still possible: Just check all the Ws, and
>the characters around them, and change the Ws surrounded by
>Cyrillic characters to some internally assigned "Cyrillic W" code.

Well, that's a hack, now, isn't it? I wonder just 
how many companies are going to want to build 
that kind of thing into their OS.
-- 
Michael Everson * http://www.evertype.com


More information about the Idna-update mailing list