Comments on idnabis-rationale-01

JFC Morfin jefsey at jefsey.com
Tue Jul 22 18:30:36 CEST 2008


John,
I appreciate this is something for me as a user to understand better 
the published text. However, at this time I am quite confused. Could 
you please help with what follows:
1) a simple table giving the term, an example, and a definition. For 
U-label, A-label, LDH-Label, traditional ASCII label, invalid?
2) what are "a--abcdef", xn---ghikl" in your terminology?
3) is there an objection to "xn--abcd--efgh" and how do you name it?
4) how do you name "xn--abcdef" with no punycode conversion to Unicode?
Thank you.
jfc



At 17:31 21/07/2008, John C Klensin wrote:

>Frank,
>
>I'd appreciate comments from others on this and will ultimately
>do what the WG decides to do, but...
>
>(1)  At present, LDH-label, A-label, and U-label are disjoint
>categories.  That is important to both the way Rationale is
>constructed and to terminology now being used in ICANN and
>elsewhere.   Your proposed definitions once again make A-label a
>subset of LDH-label.    I believe that, if LDH-label is to
>include both A-labels and traditional ASCII labels (i.e., labels
>that do not start in "xn--"), then we need a term for an LDH
>label that is not an A-label.   If the WG wants to invent that
>term, I'm happy to change the text, but things get a lot less
>clear if we have to go back to having no term for that concept.
>
>Those categories are also disjoint wrt a category that earlier
>versions of Rationale would have called "invalid".   I guess
>that, with the removal of the substr(label, 3,4) != "--"
>prohibition and rethinking the implications of the SRV exception
>to the 1035-preferred (more or less "host name") syntax, that
>category would now be called "no interpretation under this
>specification".  But, again, the general idea is to have
>categories that are disjoint and that, ideally, span the label
>space, not ones that overlap in some fuzzy way and therefore
>require additional qualification.
>
>The concept of an invalid A-label takes us back to almost
>exactly the terminology situation that developed after IDNA2003
>was approved.  It was not the fault of IDNA2003  --those
>documents are fairly careful-- but people discovered that they
>needed terminology and made it up, not always consistently.  So
>people talked about "punycode" as a label type, and "punycode"
>as a coding, and "invalid punycode" (only possible for the label
>type, nonsense for the coding), and didn't know whether
>"punycode-the-label" contained the prefix or not.   That sort of
>stuff just doesn't help -- people who don't understand the
>protocol and generally how IDNs are modeled just get more and
>more confused.
>
>
>(2) This WG's scope rather clearly does not including modifying
>the DNS specifications, particularly 1034, 1035, and 2181).  I
>strongly suggest that our getting entangled in debates similar
>to those that recently raged on the IETF list about domain names
>and host names would be unwise even if it were not out of
>charter (and I believe that it is out of charter too).   So
>suggestions about redefining the syntax or length of LDH-label
>(while making it the superset definition while you prefer),
>specifying its length, etc., or about defining a <top-label>
>category that is not needed for the IDNA2008 protocol or tables,
>are, I believe, out of scope and inappropriate.
>
>(3) And, if only because RFC 3987 (the IRI spec) must inevitably
>reference (normatively) IDNA, I'd really object to creating
>references that define IDNA names or properties in terms of that
>spec (I have other reasons too, but they aren't part of this
>WG's scope either).
>
>     john
>
>
>
>--On Thursday, 17 July, 2008 20:49 +0200 Frank Ellermann
><hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com> wrote:
>
> > Marcos Sanz/Denic wrote:
> >
> >> a healthy mixture of much nitpicking and some more important
> >> comments on rationale-01.
> >
> > A rather big collection of show-stoppers.  We need to get some
> > agreement about basic terms, here's a proposal:
> >
> >   LDH-label = ( L / D ) [ *61( L / D / H ) ( L / D ) ]
> >   top-label =           L *61( L / D / H ) ( L / D )
> >
> > Using another style, e.g., <letdig> instead of ( L / H ), or
> > the
> > <http://www.icann.org/tlds/agreements/coop/appendix-7-01jul07.> 
> htm> style, is no problem, as long as we agree on the concept
> > and get it in STD 68 syntax.
> >
> > With some clear ABNF it is obvious that a "bq--whatever"
> > matches  <LDH-label>.  It is also irrelevant for IDNAbis,
> > because it does not begin with "xn--".
> >
> > But we need a name for LDH labels starting with "xn--".  That
> > can be A-label, later resulting in IDNAbis valid vs. invalid
> > A-labels.
> >
> > Or it can be say <xn--label>, reserving A-label for IDNAbis
> > valid <xn--label>s.  Picking the latter, because it can be put
> > in ABNF:
> >
> >   xn--label = "xn--" ( L / D ) [ *57( L / D / H ) ( L / D ) ]
> >
> > With that it's obvious that any <xn--label> is also a
> > <top-label>, and any <top-label> is also a <LDH-label>.
> >
> > Any A-label is an <xn--label>, because that is what RFC 3492
> > and adding the "xn--" prefix do.  But the opposite is not
> > necessarily the case, some <xn--label>s are no A-labels, when
> > an attempt to determine X' = U2A( A2U( X )) yields X' != X (or
> > an error).
> >
> > Only for X == U2A( A2U( X )) an <xn--label> X is also an
> > A-label. That has to be defined in some pseudo-math, not
> > prose, using the IDNA2003 terms where possible - not A2U and
> > U2A, I made that up, but if we'd need new terms this could do.
> >
> > IDNAbis applications trying to transform A-labels into U-labels
> > have to leave anything that is no <xn--label> alone.  For an
> > <xn--label> they'll find that it is either an A-label, and then
> > it has by definition an U-label form, ready.
> >
> > Or they find it is no A-label, then it is at least an
> > LDH-label, also ready.  Or maybe not ready, when we get over
> > this step the BiDi magic might have to do something with
> > adjacent U-labels.
> >
> > The other direction is more difficult, how can applications
> > know that something is an U-label; what about non-U-labels
> > which are *63( OCTECT ) labels if a domain consists of
> > mixtures of labels, e.g., labels starting with an underscore.
> >
> > Maybe we should state that FQDNs with non-LDH-labels are out of
> > scope.  Similarly any encoding that is not UTF-8 is out of
> > scope, i.e. a solved problem in RFC 3987.  And after that it
> > should be possible to arrive at an unambiguous U-label
> > definition, where it is clear that an U-label is no LDH-label,
> > and therefore also no A-label.
> >
> >  Frank
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
>
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list