Archaic scripts (was: Re: New version: draft-ietf-idna-tables-01.txt)

Kenneth Whistler kenw at sybase.com
Thu May 8 21:20:26 CEST 2008


Andrew Sullivan asked:

> > Sumero-Akkadian is extinct. And it does *not* belong
> > in IDNs.
> 
> This gets to the nut of the debate, I think.  What I've been trying to
> ask is _why not_? 

Because it is extinct (as a writing system, pace John).
Because it isn't *useful* for IDNs.
Because nobody is clamoring for it for use in IDNs.
Because nobody *will* be clamoring for it for use in IDNs.
Because it is "hen-scratching" that all the zones will end up
    disallowing anyway.
Because even *if* there were a functioning Iraqi registry,
    even *they* would disallow it.

> We have an algorithm for generating the rules for
> what gets in.  These code points do not automatically get picked up as
> DISALLOWED by that algorithm.

We had an algorithm for generating the rules for what gets
in idna-tables-00.txt, and using *those* rules, those
code points did get automatically picked up as DISALLOWED.

>  So what is the reason for creating this
> extra list of exceptions?

And what is the reason for suddenly removing the list of
exceptions that were (correctly) filtering out all these
useless (for IDNs) historic characters? Suddenly the onus
is on me to prove they *are* useless, instead of on somebody
else to demonstrate why IDNs *need* cuneiform?

> Further, I am sure that operators of large zones do not want to go
> through this protocol rewriting exercise again.  If there is some
> automatic way to classify future additions to Unicode as belonging to
> this category of exception (and so far, I don't think I've seen one
> proposed, but I might have misunderstood the discussion around the
> planes), then I can see a convenient way to exclude everything in that
> category.

If you need an appeal to outside specification, then please
see Table 4 in UAX #31, which was created precisely for this
purpose:

http://www.unicode.org/reports/tr31/

>  Otherwise, it seems to me we'll lose the Unicode version
> agnosticism that was supposed to be one of the benefits of this work

When Avestan and Egyptian Hieroglyphics become part of Unicode --
they are already in the all-but-finalized Amendment 5 to 10646
and are thus scheduled for the eventual Unicode 5.2 -- Avestan
and Egyptian Hieroglyphics will be added to that Table 4 in
UAX #31. There you have your "automatic way to classify future
additions to Unicode", and if you build the IDNA 2008 spec
accordingly, you have correctly maintained your Unicode version
agnosticism and don't have to keep revising the RFC to update
versions.

The UTC assumed, I think, that making such a list in a
Table in a Unicode Standard Annex (a normative part of each
version of the Unicode Standard) would suffice. But if this
kind of thing also needs to be codified as a character property,
then it seems to me the UTC could find a way to come up
with a property Historic_Obsolete_Inappropriate_For_Identifier
or whatever, just as well, that would match the contents of
Table 4 in UAX #31.

> I do think that many zone operators ought to be advised not to allow
> these code points anyway.  But that's a completely different matter
> from deciding at the protocol level that they're not allowed.

Agreed.

--Ken

>  Also,
> it follows from the general principle, "Don't publish what you don't
> understand."
> 
> A
> 
> -- 
> Andrew Sullivan
> ajs at commandprompt.com
> +1 503 667 4564 x104
> http://www.commandprompt.com/
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
> 



More information about the Idna-update mailing list