Kurdish (Cyrillic) and Turkmen (Latin) characters

Fri Oct 7 18:14:50 CEST 2005

John Clews scripsit:
> Just a query - does anybody active in JTC1/SC2/WG2 know the answers to
> questions 1, 2 and 3?

I have no connection with WG2, but I am an invited expert of the Unicode
Consortium, so I have some knowledge of these points.

> 1. Have all previous attempts to have ISO/IEC 10646 and Unicode include
> _Cyrillic_ letters Q/q and W/w run into the ground? If so, why?

This amounts to disunifying the Latin and Cyrillic letters.  Disunification
is not undertaken lightly.  As yet, the proposal to disunify these four
letters has never reached sufficient strength in UTC or WG2, though both
Michael Everson and I favor it.

In addition, Kurdish in Cyrillic is a minority script: only about 200,000
of the nine to twelve million Kurdish speakers live in countries where it
has ever been in use (the former USSR, basically).

See http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_mixing.html
for Nick Nicholas's detailed account of the Kurdish and Wakhi situation
(Wakhi's new writing system uses both Greek and Cyrillic letters mixed in to
a Latin base script).

> How do any applications deal with sorting Cyrillic Kurdish text that
> includes these characters?

Sorting is a matter of locale, not of script: one needs to tailor the
default Latin sort order just to get the Scandinavian languages right.

> 2. Has there been an attempt to have ISO/IEC 10646 and Unicode include
> Latin _letters_ ¢ and £, so that they can have the qualities of letters,
> and be sorted appropriately?

It's been thought about, but the UTC consensus to date is that Turkmenistan
crafted their alphabet specifically to fit into a certain 8-bit code page,
and that disunifying these characters would do neither the Turkmens nor
anyone else any good, as no one would bother with a special conversion
for Turkmen only.

> 3. How do any applications actually cope with sorting multilingual text
> which includes these characters, if they are not yet in ISO/IEC 10646 and
> Unicode? How does ISO/IEC 14651 cope with this?

The characters are of course included, they are simply not letters.  There is
no reason why characters must be letters in order to be included in a
language-specific collation table.

-- 
John Cowan  jcowan at reutershealth.com  www.reutershealth.com  www.ccil.org/~cowan
I am he that buries his friends alive and drowns them and draws them
alive again from the water. I came from the end of a bag, but no bag
went over me.  I am the friend of bears and the guest of eagles. I am
Ringwinner and Luckwearer; and I am Barrel-rider.  --Bilbo to Smaug