Comments on idnabis-rationale-01

Mon Jul 21 17:31:31 CEST 2008

Frank,

I'd appreciate comments from others on this and will ultimately
do what the WG decides to do, but...

(1)  At present, LDH-label, A-label, and U-label are disjoint
categories.  That is important to both the way Rationale is
constructed and to terminology now being used in ICANN and
elsewhere.   Your proposed definitions once again make A-label a
subset of LDH-label.    I believe that, if LDH-label is to
include both A-labels and traditional ASCII labels (i.e., labels
that do not start in "xn--"), then we need a term for an LDH
label that is not an A-label.   If the WG wants to invent that
term, I'm happy to change the text, but things get a lot less
clear if we have to go back to having no term for that concept.

Those categories are also disjoint wrt a category that earlier
versions of Rationale would have called "invalid".   I guess
that, with the removal of the substr(label, 3,4) != "--"
prohibition and rethinking the implications of the SRV exception
to the 1035-preferred (more or less "host name") syntax, that
category would now be called "no interpretation under this
specification".  But, again, the general idea is to have
categories that are disjoint and that, ideally, span the label
space, not ones that overlap in some fuzzy way and therefore
require additional qualification.

The concept of an invalid A-label takes us back to almost
exactly the terminology situation that developed after IDNA2003
was approved.  It was not the fault of IDNA2003  --those
documents are fairly careful-- but people discovered that they
needed terminology and made it up, not always consistently.  So
people talked about "punycode" as a label type, and "punycode"
as a coding, and "invalid punycode" (only possible for the label
type, nonsense for the coding), and didn't know whether
"punycode-the-label" contained the prefix or not.   That sort of
stuff just doesn't help -- people who don't understand the
protocol and generally how IDNs are modeled just get more and
more confused.

(2) This WG's scope rather clearly does not including modifying
the DNS specifications, particularly 1034, 1035, and 2181).  I
strongly suggest that our getting entangled in debates similar
to those that recently raged on the IETF list about domain names
and host names would be unwise even if it were not out of
charter (and I believe that it is out of charter too).   So
suggestions about redefining the syntax or length of LDH-label
(while making it the superset definition while you prefer),
specifying its length, etc., or about defining a <top-label>
category that is not needed for the IDNA2008 protocol or tables,
are, I believe, out of scope and inappropriate.

(3) And, if only because RFC 3987 (the IRI spec) must inevitably
reference (normatively) IDNA, I'd really object to creating
references that define IDNA names or properties in terms of that
spec (I have other reasons too, but they aren't part of this
WG's scope either).

    john

--On Thursday, 17 July, 2008 20:49 +0200 Frank Ellermann
<hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com> wrote:

> Marcos Sanz/Denic wrote:
> 
>> a healthy mixture of much nitpicking and some more important 
>> comments on rationale-01.
> 
> A rather big collection of show-stoppers.  We need to get some
> agreement about basic terms, here's a proposal:
> 
>   LDH-label = ( L / D ) [ *61( L / D / H ) ( L / D ) ]
>   top-label =           L *61( L / D / H ) ( L / D )
> 
> Using another style, e.g., <letdig> instead of ( L / H ), or
> the
> <http://www.icann.org/tlds/agreements/coop/appendix-7-01jul07.
> htm> style, is no problem, as long as we agree on the concept
> and get it in STD 68 syntax.  
> 
> With some clear ABNF it is obvious that a "bq--whatever"
> matches  <LDH-label>.  It is also irrelevant for IDNAbis,
> because it does not begin with "xn--".  
> 
> But we need a name for LDH labels starting with "xn--".  That
> can be A-label, later resulting in IDNAbis valid vs. invalid
> A-labels.
> 
> Or it can be say <xn--label>, reserving A-label for IDNAbis
> valid <xn--label>s.  Picking the latter, because it can be put
> in ABNF:
> 
>   xn--label = "xn--" ( L / D ) [ *57( L / D / H ) ( L / D ) ]
> 
> With that it's obvious that any <xn--label> is also a
> <top-label>, and any <top-label> is also a <LDH-label>.  
> 
> Any A-label is an <xn--label>, because that is what RFC 3492
> and adding the "xn--" prefix do.  But the opposite is not
> necessarily the case, some <xn--label>s are no A-labels, when
> an attempt to determine X' = U2A( A2U( X )) yields X' != X (or
> an error).
> 
> Only for X == U2A( A2U( X )) an <xn--label> X is also an
> A-label. That has to be defined in some pseudo-math, not
> prose, using the IDNA2003 terms where possible - not A2U and
> U2A, I made that up, but if we'd need new terms this could do.
> 
> IDNAbis applications trying to transform A-labels into U-labels
> have to leave anything that is no <xn--label> alone.  For an
> <xn--label> they'll find that it is either an A-label, and then
> it has by definition an U-label form, ready.
> 
> Or they find it is no A-label, then it is at least an
> LDH-label, also ready.  Or maybe not ready, when we get over
> this step the BiDi magic might have to do something with
> adjacent U-labels.
> 
> The other direction is more difficult, how can applications
> know that something is an U-label; what about non-U-labels
> which are *63( OCTECT ) labels if a domain consists of
> mixtures of labels, e.g., labels starting with an underscore.
> 
> Maybe we should state that FQDNs with non-LDH-labels are out of
> scope.  Similarly any encoding that is not UTF-8 is out of
> scope, i.e. a solved problem in RFC 3987.  And after that it
> should be possible to arrive at an unambiguous U-label
> definition, where it is clear that an U-label is no LDH-label,
> and therefore also no A-label.
> 
>  Frank
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update