Archaic scripts (was: Re: New version:draft-ietf-idna-tables-01.txt)

Erik van der Poel erikv at google.com
Thu May 8 21:57:38 CEST 2008


I don't think it is necessary to create yet another category called
PVALID-Historic.

I believe the main reason we are having this long discussion is
because some of us have simply accepted the position that it ought to
be difficult to move a character from DISALLOWED to PVALID (or
CONTEXT*). I don't think there is any need to make that kind of change
so difficult. Software that gets burned into ROM or has other reasons
not to be upgraded should stick to LDH- and A-labels.

If the developer of a piece of software would like to incorporate a
routine that converts U-labels to A-labels, then they should also be
willing to make their software upgradable. They need to make it
upgradable anyway, because some codepoints are currently unassigned
but may move to one of the other categories in the future. Also,
systems that perform pre-processing will not know what to do with
unassigned characters until they have been assigned (and their
lower-casing, etc have been determined).

So our first cut (IDNA2008) could be based on a relatively simple set
of rules based on Unicode properties and historic scripts as Ken
mentioned, and future RFCs can refine those rules, possibly moving
some DISALLOWED characters to other categories.

This also neatly solves the problem of whether or not IDNA-unaware and
IDNA-aware clients are allowed to look up labels with Punycode in
them. They should always be allowed to do so. Only software that tries
to convert from U-labels to A-labels needs to be restricted. This is
how we can achieve the most reasonable level of interoperability, in
my opinion.

Erik

On Thu, May 8, 2008 at 12:31 PM, Debbie Garside
<debbie at ictmarketing.co.uk> wrote:
> Andrew wrote:
>
>  > I do think that many zone operators ought to be advised not to allow
>  > these code points anyway.  But that's a completely different matter
>  > from deciding at the protocol level that they're not allowed.
>
>  to which Ken responded:
>
>  >Agreed
>
>  I think this is perhaps the way forward.  It seems to me that there are good
>  arguments on both sides.  Perhaps a PVALID-Historic category could be
>  created with scope to move to PVALID.
>
>  Best regards
>
>  Debbie
>
>
>
>  > -----Original Message-----
>  > From: idna-update-bounces at alvestrand.no
>  > [mailto:idna-update-bounces at alvestrand.no] On Behalf Of
>
> > Kenneth Whistler
>  > Sent: 08 May 2008 20:20
>  > To: ajs at commandprompt.com
>  > Cc: idna-update at alvestrand.no; kenw at sybase.com
>  > Subject: Re: Archaic scripts (was: Re: New
>  > version:draft-ietf-idna-tables-01.txt)
>  >
>
>
> > Andrew Sullivan asked:
>  >
>  > > > Sumero-Akkadian is extinct. And it does *not* belong in IDNs.
>  > >
>  > > This gets to the nut of the debate, I think.  What I've
>  > been trying to
>  > > ask is _why not_?
>  >
>  > Because it is extinct (as a writing system, pace John).
>  > Because it isn't *useful* for IDNs.
>  > Because nobody is clamoring for it for use in IDNs.
>  > Because nobody *will* be clamoring for it for use in IDNs.
>  > Because it is "hen-scratching" that all the zones will end up
>  >     disallowing anyway.
>  > Because even *if* there were a functioning Iraqi registry,
>  >     even *they* would disallow it.
>  >
>  > > We have an algorithm for generating the rules for what gets
>  > in.  These
>  > > code points do not automatically get picked up as
>  > DISALLOWED by that
>  > > algorithm.
>  >
>  > We had an algorithm for generating the rules for what gets in
>  > idna-tables-00.txt, and using *those* rules, those code
>  > points did get automatically picked up as DISALLOWED.
>  >
>  > >  So what is the reason for creating this extra list of exceptions?
>  >
>  > And what is the reason for suddenly removing the list of
>  > exceptions that were (correctly) filtering out all these
>  > useless (for IDNs) historic characters? Suddenly the onus is
>  > on me to prove they *are* useless, instead of on somebody
>  > else to demonstrate why IDNs *need* cuneiform?
>  >
>  > > Further, I am sure that operators of large zones do not want to go
>  > > through this protocol rewriting exercise again.  If there is some
>  > > automatic way to classify future additions to Unicode as
>  > belonging to
>  > > this category of exception (and so far, I don't think I've seen one
>  > > proposed, but I might have misunderstood the discussion around the
>  > > planes), then I can see a convenient way to exclude
>  > everything in that
>  > > category.
>  >
>  > If you need an appeal to outside specification, then please
>  > see Table 4 in UAX #31, which was created precisely for this
>  > purpose:
>  >
>  > http://www.unicode.org/reports/tr31/
>  >
>  > >  Otherwise, it seems to me we'll lose the Unicode version
>  > agnosticism
>  > > that was supposed to be one of the benefits of this work
>  >
>  > When Avestan and Egyptian Hieroglyphics become part of
>  > Unicode -- they are already in the all-but-finalized
>  > Amendment 5 to 10646 and are thus scheduled for the eventual
>  > Unicode 5.2 -- Avestan and Egyptian Hieroglyphics will be
>  > added to that Table 4 in UAX #31. There you have your
>  > "automatic way to classify future additions to Unicode", and
>  > if you build the IDNA 2008 spec accordingly, you have
>  > correctly maintained your Unicode version agnosticism and
>  > don't have to keep revising the RFC to update versions.
>  >
>  > The UTC assumed, I think, that making such a list in a Table
>  > in a Unicode Standard Annex (a normative part of each version
>  > of the Unicode Standard) would suffice. But if this kind of
>  > thing also needs to be codified as a character property, then
>  > it seems to me the UTC could find a way to come up with a
>  > property Historic_Obsolete_Inappropriate_For_Identifier
>  > or whatever, just as well, that would match the contents of
>  > Table 4 in UAX #31.
>  >
>  > > I do think that many zone operators ought to be advised not
>  > to allow
>  > > these code points anyway.  But that's a completely different matter
>  > > from deciding at the protocol level that they're not allowed.
>  >
>  > Agreed.
>  >
>  > --Ken
>  >
>  > >  Also,
>  > > it follows from the general principle, "Don't publish what
>  > you don't
>  > > understand."
>  > >
>  > > A
>  > >
>  > > --
>  > > Andrew Sullivan
>  > > ajs at commandprompt.com
>  > > +1 503 667 4564 x104
>  > > http://www.commandprompt.com/
>  > > _______________________________________________
>  > > Idna-update mailing list
>  > > Idna-update at alvestrand.no
>  > > http://www.alvestrand.no/mailman/listinfo/idna-update
>  > >
>  >
>  > _______________________________________________
>  > Idna-update mailing list
>  > Idna-update at alvestrand.no
>  > http://www.alvestrand.no/mailman/listinfo/idna-update
>  >
>  >
>  >
>  >
>
>
>
>
>  _______________________________________________
>  Idna-update mailing list
>  Idna-update at alvestrand.no
>  http://www.alvestrand.no/mailman/listinfo/idna-update
>


More information about the Idna-update mailing list