Mixing scripts (Re: Unicode versions (Re: Criteria forexceptional characters))

Michael Everson everson at evertype.com
Tue Dec 26 16:48:19 CET 2006


At 10:54 +0900 2006-12-26, Martin Duerst wrote:

>I don't think you understood what I meant. What I meant was the 
>following: Assuming that in Kurdish orthography, e.g. Latin 'A' 
>corresponds to Cyrillic 'A', Latin 'R' corresponds to Cyrillic 'P', 
>Latin 'S' corresponds to Cyrillic 'C', and so on, the mixed sorting 
>that I'm proposing is to sort all Latin 'A's with all Cyrillic 'A's, 
>all Latin 'R's with Cyrillic 'P's, all Latin 'S's with Cyrillic 
>'C's, and so on.

This causes a visual sea-sickness which people really do not prefer. 
I have many, many, many books with multilingual indices and scripts 
are univerally split up. There is, in Greece, some currency given to 
an Ellenolatiniki ordering, which is really very arbitrary, based not 
on preexisting native practice, but on the old Latin/Greek ASCII 
fonts (so PSI and C are on the same key and sort together.

>That way, a user can go directly from pronunciation to an entry in 
>the list without having to look in two places (one in the Latin part 
>of the list and one in the Cyrillic part of the list).

Yes, well, people who use alphabets don't like this. It is confusing. 
(I know that the Japanese do interfile the kanas and Kanji. But 
Japanese as ever is the splendid exception.)

A mixed-script sort would be a special case, not a default. I have 
samples of Kurdish-Russian dictionaries with Cyrillic orthography as 
well as Kurdish-Russian dictionaries with Latin orthography. And 
Arabic orthography. I really cannot imagine a scenario where people 
would want these to be mixed.

>In terms of the sorting algorithm, the script difference is 
>relegated to a level e.g. similar to a case difference (the standard 
>example for this would be Japanese Katakana and Hiragana). To what 
>extent this is feasible depends on how clear and stable the 
>correspondence between the letters or letter
>combinations in the two scripts are. I do not know the situation 
>with respect to Kurdish.

It would not be entirely 1-to-1.

>  >>But more fundamentally, if there are one or two 'foreign script'
>>>letters in context (i.e. you don't have any Kurds named "Mr. W",
>>>the actual sorting is still possible: Just check all the Ws, and
>>>the characters around them, and change the Ws surrounded by
>>>Cyrillic characters to some internally assigned "Cyrillic W" code.
>>
>>Well, that's a hack, now, isn't it?
>
>Well, yes, you can call it that.

I would rather not see the Kurds, who have troubles enough in this 
world, disadvantaged by such a treatment of their written language.
-- 
Michael Everson * http://www.evertype.com


More information about the Idna-update mailing list