<html>
<body>
At 06:42 17/02/2009, YAO Jiankang wrote:<br>
<blockquote type=cite class=cite cite=""> <br>
<br>
<font size=2>now, the definitions of A-LABELS, U-LABELS AND NR-LDH
LABELS, LDH-Labels , R-LDH-labels are very clear to
me</font></blockquote><br>
yeap. However, for clarity sake, I would advise not to mix description
and validity considerations.<br>
Better to describe the "geography" of the names' syntax, and
then discuss their usage within the current IDNA context? <br>
jfc<br><br>
<br>
<blockquote type=cite class=cite cite="">
<dl>
<dd>----- Original Message -----
<dd>From: <a href="mailto:vint@google.com">Vint Cerf</a>
<dd>To:
<a href="mailto:idna-update@alvestrand.no">idna-update@alvestrand.no</a>
<dd>Sent: Tuesday, February 17, 2009 5:00 AM
<dd>Subject: Status of IDNABIS Working Group<br>
<dd>NB: THIS TEXT MUST BE READ WITH A FIXED WIDTH
<dd>COURIER FONT FOR THE ILLUSTRATIONS TO LINE UP PROPERLY: <br>
<dd>A fair amount of work is underway to improve the clarity
<dd>of the Definitions and Rationale documents and to revise
<dd>the others as needed to take into account proposed new
<dd>terminology. The intent is to have as much of this work
<dd>as possible available for WG review in time for the March
<dd>IETF in San Francisco. Two sessions have been reserved
<dd>during the week: one on Monday, March 23 and one on
<dd>Tuesday, March 24.<br>
<dd>At that meeting we will also want to take up a comparison
<dd>of the documents that reflect the work outlined in the
<dd>charter and the recent proposal made by Paul Hoffman for
<dd>an alternative to that approach. <br>
<dd>The revision work takes up the following tersely rendered
<dd>set of definitions (it will be best to read the revised
<dd>Definitions document when released for a more complete
<dd>picture).<br>
<dd>The text below is intended to convey the flavor of the
<dd>attempt to clarify definitions but is not the entire
<dd>text that is in preparation.<br>
<dd>2.3. Terminology Specific to IDNA<br>
<dd> This section defines some terminology to reduce
<dd>dependence on term and definitions that have been
<dd>problematic in the past.<br>
<dd>An LDH-Label is a string consisting solely of ASCII
<dd>upper and/or lower case letters, digits 0-9 and the hyphen
<dd>("-"). These labels are limited to 63 characters and do
<dd>not include a hyphen at either the beginning or end of
<dd>the string. Some people might call this a "traditional
<dd>host name" label.<br>
<dd>A new subset of LDH-Labels is defined that have the
<dd>property that they all have a sequence of ASCII hyphens
<dd>in the third and fourth character position from the
<dd>beginning of the label. Roughly, in left-to-right form
<dd>this would read "??--" where "??" is drawn from
the
<dd>traditional LDH set of characters, except that the first
<dd>"?" cannot be a hyphen by definition of LDH-label nor can
<dd>the last character of the label be a hyphen. This subset of
<dd>LDH-labels is named R-LDH-labels for "reserved LDH-Labels.
<dd>Labels that are NOT members of the R-LDH-label category are
<dd>called the Non-Reserved-Labels or NR-LDH-Labels and they
<dd>make up the remainder of the LDH-label universe.<br>
<dd>This distinction among possible LDH labels is only has
<dd>significance for software that is "IDNA-aware". Otherwise,
<dd>all LDH-labels meeting the definition above are accepted as
<dd>valid by non-IDNA aware software.<br>
<dd>As it happens, only a subset of the R-LDH-labels can
<dd>potentially be used in IDN-aware applications, specifically
<dd>the class of labels that begin with the prefix ("xn--")
<dd>[what about "XN--"?].<br>
<dd>This class we call "XN-labels". Of this class, only a
<dd>subset of these that we will call "A-labels" are valid
<dd>for use in IDNA-aware applications, namely the subset
<dd>that is valid Punycode output limited to 59 characters
<dd>in addition to the "xn--" prefix and which can be converted
<dd>into valid Unicode characters by a reverse algorithm
<dd>(cf RFC3492). Valid Unicode characters are defined by
<dd>conformance to the Protocol, Table and BiDi documents
<dd>that identify which Unicode characters can be used in
<dd>IDNA2008-aware applications. <br>
<dd>There is also a class of labels that are prefixed with
"xn--"
<dd>but whose remaining characters cannot be converted into
<dd>valid Unicode, or cannot be produced using the Punycode
<dd>encoding algorithm or that otherwise do not meet the A-label
<dd>criteria. These we will refer to as Invalid-A-labels
<dd>[or something like that]. <br>
<dd>The R-LDH-labels that are neither A-labels nor
<dd>invalid-A-labels are reserved and not permitted to be
<dd>used in IDNA2008-aware applications.<br>
<dd>Labels that satisfy the LDH-Label criteria but that are
<dd>not Reserved-LDH Labels are called Non-Reserved LDH labels
<dd>or NR-LDH-labels.<br>
<br>
<dd>FOR IDN2008-AWARE SYSTEMS, VALID LABELS INCLUDE:<br>
<dd>A-LABELS, U-LABELS AND NR-LDH LABELS. <br>
<dd>IDNA-LABELS COME IN TWO FLAVORS: AN ACE-ENCODED FORM AND A UNICODE
FORM.
<dd>THESE ARE REFERRED TO AS A-LABELS AND U-LABELS RESPECTIVELY.<br>
<br>
<dd>
ASCII-LABEL
<dd>----------------------------------------------------------------
<dd>
|
|
<dd>
|
LDH-LABEL (1)
(4)
|
<dd>|
___________________________________________________ |
<dd>|
|
| |
<dd>|
|
| |
<dd>| |
__________________________________
| |
<dd>| | |IDN
Reserved LDH Labels
|
| |
<dd>| | |
("??--") or R-LDH
LABELS
|
| |
<dd>| |
|
| NONRESERVED | |
<dd>| | |
------------------------------- | LDH LABELS | |
<dd>| | |
| XN
LABELS
|
|
| |
<dd>| | | |
_____________ ___________ |
|
| |
<dd>| | | |
|
| | ||
|NR-LDH LABELS| |
<dd>| | | | |
A-labels | | Invalid ||
|
| |
<dd>| | | | |
"xn--"(2) | | A-labels ||
|
| |
<dd>| | | |
|___________| |____(3)___||
|
| |
<dd>| | |
|_____________________________|
|
| |
<dd>| |
|_________________________________|
| |
<dd>|
|__________________________________________________| |
<dd>
|
|
<dd>
|
|
<dd>|
NON-LDH-LABEL
|
<dd>|
_______________________________________________
|
<dd>|
|
| |
<dd>|
|
________________________
| |
<dd>|
| | SRV &
SRV-LIKE
|
| |
<dd>|
| | e,g,
_tcp
|
| |
<dd>|
|
|______________________|
| |
<dd>|
|
________________________
| |
<dd>|
| | leading or
trailing
|
| |
<dd>|
| | hyphens
"-abcd"
|
| |
<dd>|
| | or "xyz-"
or "-uvw-"
|
| |
<dd>|
|
|______________________|
| |
<dd>|
|
________________________
| |
<dd>|
| | Other
non-LDH
|
| |
<dd>|
| | ASCII
Chars
|
| |
<dd>|
| | e.g.
#$%&_
|
| |
<dd>|
|
|______________________|
| |
<dd>|
|_____________________________________________|
|
<dd>|______________________________________________________________|<br>
<br>
<dd> (1) ASCII
letters (upper and lower case), digits,
<dd>
hyphen. Hyphen may not appear in first or last
<dd>
position. Less than 64 characters.
<dd> (2) Note that
the string following "xn--" must
<dd>
be the valid output of the Punycode algorithm
<dd>
and must be convertible into valid U-label form.
<dd> (3) Note that
an Invalid-A-Label has a prefix "xn--"
<dd>
but the remainder of the label is NOT the valid
<dd>
output of the Punycode algorithm.<br>
<dd> (4) LDH-LABEL
subtypes are indistinguishable to IDNA-unaware
<dd>
applications.<br>
<br>
<br>
<dd>
__________________________
<dd>
|
Non-ASCII
|
<dd>
|
|
<dd>
| ___________________ |
<dd>
| | U-label (5) | |
<dd>
| |_________________| |
<dd>
|
|
| |
<dd>
| | Binary Label | |
<dd>
| | (including | |
<dd>
| | high bit on) | |
<dd>
| |_________________| |
<dd>
|
|
| |
<dd>
| | Bit String | |
<dd>
| |
Label | |
<dd>
| |_________________| |
<dd>
|________________________|<br>
<dd> (5) To IDNA-unaware
applications, U-labels are
<dd>
indistinguishable from Binary ones.<br>
<dd>
Figure 1: IDNA and Related DNS Terminology Space<br>
<dd>==================<br>
<br>
<dd>As I have understood the WG charter, the intention has been
<dd>to devise a means to avoid specific dependence of the
<dd>specifications on any particular instance of the Unicode
<dd>character set. The general posture of the IDNA2008 document
<dd>set has also attempted to maintain a one-to-one relationship
<dd>between labels produced by the Punycode encoding algorithm and
<dd>the associated Unicode string. In brief terms, the A-Labels and
<dd>U-Labels of the IDNA2008 can be mapped back and forth without
<dd>any loss or change in the respective A-label or U-label strings.
<br><br>
<dd>Document editors are working to incorporate these new definitions
<dd>and the sense of exchanges on the mailing list. <br>
<dd>As of this writing, it is my understanding that the Esszet and
<dd>Final Sigma characters are to be treated as protocol-valid and
<dd>that registries (in the most general sense of the word) are
<dd>prepared to deal with the side-effects of prior registrations
<dd>following the IDNA2003 guidelines.<br>
<dd>The current version of Tables rules out the use of Hangul Jamos
<dd>per the recommendation of Korean language experts. <br>
<dd>There remains further discussion and resolution of the use
<dd>of Indic digits particularly in connection with the BiDi
<dd>specifications. <br>
<dd>There has also been some discussion about mapping on the list.
<dd>The "going-in" assumption has been that the IDNA2008
<dd>specifications do not consider formalizing mappings. Some
<dd>mappings may occur for local reasons prior to look up or
<dd>registration of labels in domain names. Concern has been
<dd>raised that if mappings are not standardized and uniform
<dd>some surprises may ensue.<br>
<dd>We may need to discuss whether some form of standardized
<dd>mapping is needed, possibly to maintain least surprise
<dd>for users accustomed to the behavior of non-IDNA
<dd>domain names (e.g. upper/lower case equivalence
<dd>for lookup purposes).
<dd>
<dd>How ever this discussion ends up, there appears to be some
<dd>consensus that the registration process should not, in and
<dd>of itself, involve mapping. That is: only valid U-labels
<dd>or A-labels should be presented to the DNS system for entry
<dd>into the DNS zone files. <br>
<dd>An assumption is made in the present specifications that
<dd>any registered domain label derived from non-ASCII Unicode
<dd>characters will be one-to-one convertible to A-label form
<dd>from the Unicode form (U-label form) and vice-versa. <br>
<br>
<dd>I believe we will have on the agenda several items:<br>
<dd>1. review of the then current status of the WG documents
<dd> and any then known unresolved questions or issues
<dd>2. Consideration of Paul Hoffman's alternative proposal
<dd> to extend IDNA2003
<dd>3. Discussion of the role of mapping from the IDNA2008
<dd> perspective. <br>
<br>
<dd>I will prepare a more precise agenda along with issues to
<dd>be discussed and resolved as the time approaches for our
<dd>meeting in March. <br>
<dd>Vint<br>
<dd>Vint Cerf
<dd>Google
<dd>1818 Library Street, Suite 400
<dd>Reston, VA 20190
<dd>202-370-5637
<dd><a href="mailto:vint@google.com">vint@google.com</a><br>
<br>
<dd>Vint Cerf
<dd>Google
<dd>1818 Library Street, Suite 400
<dd>Reston, VA 20190
<dd>202-370-5637
<dd><a href="mailto:vint@google.com">vint@google.com</a><br>
<br>
<br>
<br><br>
<hr>
<dd>_______________________________________________
<dd>Idna-update mailing list
<dd>Idna-update@alvestrand.no
<dd>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" eudora="autourl">
http://www.alvestrand.no/mailman/listinfo/idna-update</a><br><br>
</dl>_______________________________________________<br>
Idna-update mailing list<br>
Idna-update@alvestrand.no<br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" eudora="autourl">
http://www.alvestrand.no/mailman/listinfo/idna-update</a></blockquote>
</body>
</html>