Mixing scripts (Re: Unicode versions (Re: Criteria
forexceptional characters))
Michael Everson
everson at evertype.com
Sun Dec 24 20:18:35 CET 2006
At 14:06 -0500 2006-12-24, John C Klensin wrote:
>This would seem reasonable, except for the
>number of times we have been told that block
>structure, and the ordering of characters within
>a block, have nothing to do with collation
>sequences.
No, John, I am not talking about binary sorting.
I'm talking about one letter that would have to
be in two places at the same time.
Consider a list of personal names in Kurdish.
Some in Latin, some in Cyrillic. You want to sort
them. It's a single language. There is only one W
available. How will you sort the names beginning
with W? They will all interfile, Latin and
Cyrillic, at Latin W.
The environment here is plain text: file names in
a directory. No language tagging. No ISO 15924
script tagging. No fancy XML.
There are scientific linguistic environments
where mixing of Latin Greek and Cyrillic are
expected, but this is a standard orthography of a
natural language -- and not just a language that
might mix letters, but a language with more than
one official orthography where the lack of
CYRILLIC WE is an actual problem.
> > No, never, because of the functional requirements. One could
> > not expect <o> to sort in three different places in a
> > multilingual glossary (Russian, English, Greek).
>
>See above about collation. And note that, even
>within the fairly basic set of decorated Latin
>characters, logical sort order is a localization
>(language at least) issue, not one that Unicode
>can possibly address properly.
Not so, really. Collation is handled very well,
and tailoring too. This is rather different it
seems to me.
>To the extent to which I understand this, I
>agree with you. My only points are (i) that
>some views of consistency are becoming the
>victim of this particular set of requirements
>and (ii) one net effect is to introduce more
>cross-script confusables.
Well, disadvantaging Kurds who use Cyrillic Aa Ee
Oo Öö Schwa Qq by denying them Ww doesn't seem
like the right thing to do, which is why I'm
proposing to add some characters to the UCS.
After all, the UCS is for A GREAT MANY MORE
THINGS than IDN.
> > I understand that a script-ban will not be deeply embedded.
>
>And this is part of the reason why.
Glad Yule to all.
--
Michael Everson * http://www.evertype.com
More information about the Idna-update
mailing list