Archaic scripts (was: Re: New version:draft-ietf-idna-tables-01.txt)

Kenneth Whistler kenw at sybase.com
Thu May 8 21:55:47 CEST 2008


Vint,

> I interpret your position to lie mostly on the utility axis and your  
> argument to be of the form "it isn't useful to include these scripts"

That is certainly a significant part of my argument. And I
object to the sudden turning of things on their head, so
that the burden is on people to demonstrate that historic
scripts are not useful in IDNs, when the thrust of IDNAbis,
as Michel pointed out, was to recover from the over-permissiveness
of IDNA 2003 and only put in the characters that we could
reasonably surmise *would* be generally useful -- rather than
just toss everything in on the premise that if nobody uses it,
what could be the harm?

> 
> Others may say, "but what's the harm?"
> 
> I am assuming for the moment that you do not equate "useless" with  
> "harmful"?

Actually, while I don't *equate* them, in this case I think
they are related.

The harm is not technical, per se. Sure, having 27 Gothic letters
in the table as PVALID that nobody uses, that no registry allows,
and that nobody cares about having in IDNs wouldn't be any
more harmful than having some old Han character in the table
as PVALID that nobody uses (and that maybe even the Chinese
registries would disallow for some reason).

But there is a perception-related harm here that I think people
are overlooking, and which I think John has, unfortunately
inverted.

In particular, some of the historic scripts are largely
pictographic. See Linear B, and Cypriot, for example. And
there are going to be more of these coming in with the
heiroglyphic systems: Egyptian hieroglyphics are imminent,
and Anatolian hieroglyphics won't be too far behind -- then
there are large pictographic systems in China, as well.

But what people are missing is that modern language users
also make use of pictographs -- and with phone technology
in particular, this kind of usage is *increasing*. But all
such usage got banned from IDNs simply because, based
on Unicode character property assignments, those pictographs
in modern use are *symbols* (gc=So), but pictographs used
3000 years ago are *letters* (gc=Lo).

So the U+2665 of I{heart}NY is disallowed ("becuz we say so"),
despite the fact that people *do* want to use it, have
used it, and even have registered domain names like that
by IDNA 2003, but a Linear B U+100CC {wheeled chariot} is o.k.??
Or, for that matter, in Unicode 5.2, the Egyptian hieroglyph
for {heart}??

I can't think of an argument that would make any sense
as to why U+2665 isn't more useful for IDNs than *any*
Linear B character or Egyptian hieroglyph, yet we disallow
such useful symbols and propose to allow all the historic scripts --
presumably because we can do useless things automatically,
but can't figure out how to do useful things that will
still be derivable for future versions of Unicode.

The *harm* here comes from that disconnect and the likelihood
that IDNA 2008 will be attacked for being out to lunch about
usefulness of characters allowed and disallowed. And the
likelihood that the resulting perception about the disconnect
will lead to the very instability y'all fear -- which is
people demanding that DISALLOWED characters be reconsidered
and moved back into PVALID status.

The more bizarre, useless crap you make PVALID with no
good reasoning behind it, the stronger you make the case
for people who will demand that perfectly obviously useful
symbols exiled to DISALLOWED status be reconsidered in
a revision of the protocol and its tables.

Instead, we seem now to be focussing on the utterly remote
possibility that IDNA will instead be besieged by
Assyriologists, Egyptologists, Glagoliticists, Old Turkic
runologists, and script reenactors demanding that IDNA
has somehow unfairly disadvantaged them by keeping
their particular archaic script out of domain names.

--Ken






More information about the Idna-update mailing list