Comments on idnabis-rationale-01
hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com
Thu Jul 17 20:49:01 CEST 2008
Marcos Sanz/Denic wrote:
> a healthy mixture of much nitpicking and some more important
> comments on rationale-01.
A rather big collection of show-stoppers. We need to get some
agreement about basic terms, here's a proposal:
LDH-label = ( L / D ) [ *61( L / D / H ) ( L / D ) ]
top-label = L *61( L / D / H ) ( L / D )
Using another style, e.g., <letdig> instead of ( L / H ), or the
style, is no problem, as long as we agree on the concept and get
it in STD 68 syntax.
With some clear ABNF it is obvious that a "bq--whatever" matches
<LDH-label>. It is also irrelevant for IDNAbis, because it does
not begin with "xn--".
But we need a name for LDH labels starting with "xn--". That can
be A-label, later resulting in IDNAbis valid vs. invalid A-labels.
Or it can be say <xn--label>, reserving A-label for IDNAbis valid
<xn--label>s. Picking the latter, because it can be put in ABNF:
xn--label = "xn--" ( L / D ) [ *57( L / D / H ) ( L / D ) ]
With that it's obvious that any <xn--label> is also a <top-label>,
and any <top-label> is also a <LDH-label>.
Any A-label is an <xn--label>, because that is what RFC 3492 and
adding the "xn--" prefix do. But the opposite is not necessarily
the case, some <xn--label>s are no A-labels, when an attempt to
determine X' = U2A( A2U( X )) yields X' != X (or an error).
Only for X == U2A( A2U( X )) an <xn--label> X is also an A-label.
That has to be defined in some pseudo-math, not prose, using the
IDNA2003 terms where possible - not A2U and U2A, I made that up,
but if we'd need new terms this could do.
IDNAbis applications trying to transform A-labels into U-labels
have to leave anything that is no <xn--label> alone. For an
<xn--label> they'll find that it is either an A-label, and then
it has by definition an U-label form, ready.
Or they find it is no A-label, then it is at least an LDH-label,
also ready. Or maybe not ready, when we get over this step the
BiDi magic might have to do something with adjacent U-labels.
The other direction is more difficult, how can applications know
that something is an U-label; what about non-U-labels which are
*63( OCTECT ) labels if a domain consists of mixtures of labels,
e.g., labels starting with an underscore.
Maybe we should state that FQDNs with non-LDH-labels are out of
scope. Similarly any encoding that is not UTF-8 is out of scope,
i.e. a solved problem in RFC 3987. And after that it should be
possible to arrive at an unambiguous U-label definition, where
it is clear that an U-label is no LDH-label, and therefore also
More information about the Idna-update