Mixing scripts (Re: Unicode versions (Re: Criteria forexceptional characters))

Sun Dec 24 19:14:52 CET 2006

At 12:06 -0500 2006-12-24, John C Klensin wrote:

>  > It is Kurdish, and the two letters are for other functional
>>  reasons being proposed for addition to the standard.
>
>As part of my continued effort to understand which rules apply
>and when, adding them to the standard, and identifying them as
>"Cyrillic" would seem to violate the "unify when possible" rule.
>What am I missing?

Functional requirements, such as sorting monolingual multiscript text.

>If Unicode didn't start
>with ISO 8859-* and a bunch of other locally developed CCSs as
>input and had it started with a strong and consistent
>unification rule, we would certainly have looked at that
>character and say "it doesn't exist independently in 'Latin' or
>'Cyrillic' scripts, it is just an adaptation of a Greek
>character and should be unified with it".

No, never, because of the functional 
requirements. One could not expect <o> to sort in 
three different places in a multilingual glossary 
(Russian, English, Greek).

My point is that those are no different for 
Kurdish, which has Latin, Cyrillic, and Arabic 
orthographies. Kurdish Cyrillic uses Aa, Ee, Oo, 
Öö, and <Schwa><schwa> already which are 
identical between Latin and Cyrillic, plus it 
uses Qq and Ww. Since we have found Cyrillic Q's 
which have a different capital shape than Latin 
ones do, it's quite possible that CYRILLIC LETTER 
QA will be added, in which case there is but one 
straggler, CYRILLIC LETTER WE, and my argument is 
that there is no advantage to Kurdish in sticking 
to the unification.

But I guess this is not the venue for this 
discussion. I understand that a script-ban will 
not be deeply embedded.
-- 
Michael Everson * http://www.evertype.com