Comments on idnabis-rationale-01

Frank Ellermann hmdmhdfmhdjmzdtjmzdtzktdkztdjz at
Thu Jul 17 20:49:01 CEST 2008

Marcos Sanz/Denic wrote:

> a healthy mixture of much nitpicking and some more important 
> comments on rationale-01.

A rather big collection of show-stoppers.  We need to get some
agreement about basic terms, here's a proposal:

  LDH-label = ( L / D ) [ *61( L / D / H ) ( L / D ) ]
  top-label =           L *61( L / D / H ) ( L / D )

Using another style, e.g., <letdig> instead of ( L / H ), or the
style, is no problem, as long as we agree on the concept and get
it in STD 68 syntax.  

With some clear ABNF it is obvious that a "bq--whatever" matches 
<LDH-label>.  It is also irrelevant for IDNAbis, because it does
not begin with "xn--".  

But we need a name for LDH labels starting with "xn--".  That can
be A-label, later resulting in IDNAbis valid vs. invalid A-labels.

Or it can be say <xn--label>, reserving A-label for IDNAbis valid
<xn--label>s.  Picking the latter, because it can be put in ABNF:

  xn--label = "xn--" ( L / D ) [ *57( L / D / H ) ( L / D ) ]

With that it's obvious that any <xn--label> is also a <top-label>,
and any <top-label> is also a <LDH-label>.  

Any A-label is an <xn--label>, because that is what RFC 3492 and
adding the "xn--" prefix do.  But the opposite is not necessarily
the case, some <xn--label>s are no A-labels, when an attempt to
determine X' = U2A( A2U( X )) yields X' != X (or an error).

Only for X == U2A( A2U( X )) an <xn--label> X is also an A-label.
That has to be defined in some pseudo-math, not prose, using the
IDNA2003 terms where possible - not A2U and U2A, I made that up,
but if we'd need new terms this could do.

IDNAbis applications trying to transform A-labels into U-labels
have to leave anything that is no <xn--label> alone.  For an
<xn--label> they'll find that it is either an A-label, and then
it has by definition an U-label form, ready.

Or they find it is no A-label, then it is at least an LDH-label,
also ready.  Or maybe not ready, when we get over this step the
BiDi magic might have to do something with adjacent U-labels.

The other direction is more difficult, how can applications know
that something is an U-label; what about non-U-labels which are
*63( OCTECT ) labels if a domain consists of mixtures of labels,
e.g., labels starting with an underscore.

Maybe we should state that FQDNs with non-LDH-labels are out of
scope.  Similarly any encoding that is not UTF-8 is out of scope,
i.e. a solved problem in RFC 3987.  And after that it should be
possible to arrive at an unambiguous U-label definition, where
it is clear that an U-label is no LDH-label, and therefore also
no A-label.


More information about the Idna-update mailing list