Status of IDNABIS Working Group

YAO Jiankang yaojk at cnnic.cn
Tue Feb 17 06:42:40 CET 2009



now, the definitions of  A-LABELS, U-LABELS AND NR-LDH LABELS, LDH-Labels , R-LDH-labels are very clear to me.

  ----- Original Message ----- 
  From: Vint Cerf 
  To: idna-update at alvestrand.no 
  Sent: Tuesday, February 17, 2009 5:00 AM
  Subject: Status of IDNABIS Working Group


  NB: THIS TEXT MUST BE READ WITH A FIXED WIDTH 
  COURIER FONT FOR THE ILLUSTRATIONS TO LINE UP PROPERLY: 


  A fair amount of work is underway to improve the clarity 
  of the Definitions and Rationale documents and to revise 
  the others  as needed to take into account proposed new 
  terminology. The intent is to have as much of this work 
  as possible available for WG review in time for the March 
  IETF in San Francisco. Two sessions have been reserved
  during the week: one on Monday, March 23 and one on
  Tuesday, March 24.


  At that meeting we will also want to take up a comparison 
  of the documents that reflect the work outlined in the 
  charter and the recent proposal made by Paul Hoffman for 
  an alternative to that approach. 


  The revision work takes up the following tersely rendered 
  set of definitions (it will be best to read the revised 
  Definitions document when released for a more complete 
  picture).


  The text below is intended to convey the flavor of the 
  attempt to clarify definitions but is not the entire 
  text that is in preparation.


  2.3.  Terminology Specific to IDNA


     This section defines some terminology to reduce 
  dependence on term and definitions that have been 
  problematic in the past.


  An LDH-Label is a string consisting solely of ASCII 
  upper and/or lower case letters, digits 0-9 and the hyphen 
  ("-"). These labels are limited to 63 characters and do 
  not include a hyphen at either the beginning or end of 
  the string. Some people might call this a "traditional 
  host name" label.


  A new subset of LDH-Labels is defined that have the 
  property that they all have a sequence of ASCII hyphens 
  in the third and fourth character position from the 
  beginning of the label. Roughly, in left-to-right form 
  this would read "??--" where "??" is drawn from the 
  traditional LDH set of characters, except that the first 
  "?" cannot be a hyphen by definition of LDH-label nor can 
  the last character of the label be a hyphen. This subset of 
  LDH-labels is named R-LDH-labels for "reserved LDH-Labels. 
  Labels that are NOT members of the R-LDH-label category are 
  called the Non-Reserved-Labels or NR-LDH-Labels and they 
  make up the remainder of the LDH-label universe.


  This distinction among possible LDH labels is only has 
  significance for software that is "IDNA-aware". Otherwise, 
  all LDH-labels meeting the definition above are accepted as 
  valid by non-IDNA aware software.


  As it happens, only a subset of the R-LDH-labels can 
  potentially be used in IDN-aware applications, specifically 
  the class of labels that begin with the prefix ("xn--") 
  [what about "XN--"?].


  This class we call "XN-labels". Of this class, only a 
  subset of these that we will call "A-labels" are valid 
  for use in IDNA-aware applications, namely the subset 
  that is valid Punycode output limited to 59 characters 
  in addition to the "xn--" prefix and which can be converted 
  into valid Unicode characters by a reverse algorithm 
  (cf RFC3492). Valid Unicode characters are defined by 
  conformance to the Protocol, Table and BiDi  documents 
  that identify which Unicode characters can be used in 
  IDNA2008-aware applications. 


  There is also a class of labels that are prefixed with "xn--" 
  but whose remaining characters cannot be converted into 
  valid Unicode, or cannot be produced using the Punycode 
  encoding algorithm or that otherwise do not meet the A-label 
  criteria. These we will refer to as Invalid-A-labels 
  [or something like that]. 


  The R-LDH-labels that are neither A-labels nor 
  invalid-A-labels are reserved and not permitted to be 
  used in IDNA2008-aware applications.


  Labels that satisfy the LDH-Label criteria but that are 
  not Reserved-LDH Labels are called Non-Reserved LDH labels 
  or NR-LDH-labels.




  FOR IDN2008-AWARE SYSTEMS, VALID LABELS INCLUDE:


  A-LABELS, U-LABELS AND NR-LDH LABELS. 


  IDNA-LABELS COME IN TWO FLAVORS: AN ACE-ENCODED FORM AND A UNICODE FORM. 
  THESE ARE REFERRED TO AS A-LABELS AND U-LABELS RESPECTIVELY.




                                 ASCII-LABEL
  ----------------------------------------------------------------
  |                                                              |                               
  |                 LDH-LABEL (1) (4)                            |
  |          ___________________________________________________ |
  |         |                                                  | |
  |         |                                                  | |     
  |         |  __________________________________              | |
  |         |  |IDN Reserved LDH Labels          |             | |
  |         |  | ("??--")   or R-LDH LABELS      |             | |    
  |         |  |                                 | NONRESERVED | |
  |         |  | ------------------------------- |  LDH LABELS | |
  |         |  | |       XN LABELS             | |             | |
  |         |  | | _____________   ___________ | |             | |
  |         |  | | |           |   |          || |NR-LDH LABELS| |
  |         |  | | | A-labels  |   | Invalid  || |             | |
  |         |  | | | "xn--"(2) |   | A-labels || |             | |
  |         |  | | |___________|   |____(3)___|| |             | |
  |         |  | |_____________________________| |             | |
  |         |  |_________________________________|             | |
  |         |__________________________________________________| |
  |                                                              |
  |                                                              |
  |            NON-LDH-LABEL                                     |
  |         _______________________________________________      |
  |         |                                             |      |
  |         |         ________________________            |      |
  |         |         | SRV & SRV-LIKE       |            |      |
  |         |         | e,g, _tcp            |            |      |
  |         |         |______________________|            |      |
  |         |         ________________________            |      |
  |         |         | leading or trailing  |            |      |
  |         |         | hyphens "-abcd"      |            |      |
  |         |         | or "xyz-" or "-uvw-" |            |      |
  |         |         |______________________|            |      |
  |         |         ________________________            |      |
  |         |         | Other non-LDH        |            |      |
  |         |         | ASCII Chars          |            |      |
  |         |         | e.g. #$%&_           |            |      |
  |         |         |______________________|            |      |
  |         |_____________________________________________|      |
  |______________________________________________________________|




            (1) ASCII letters (upper and lower case), digits,
               hyphen.  Hyphen may not appear in first or last
               position. Less than 64 characters.
            (2) Note that the string following "xn--" must
               be the valid output of the Punycode algorithm
               and must be convertible into valid U-label form.
            (3) Note that an Invalid-A-Label has a prefix "xn--"
               but the remainder of the label is NOT the valid
               output of the Punycode algorithm.


            (4) LDH-LABEL subtypes are indistinguishable to IDNA-unaware
                  applications.






                       __________________________
                       |  Non-ASCII             |
                       |                        |
                       |    ___________________ |
                       |    | U-label (5)     | |
                       |    |_________________| |
                       |    |                 | |
                       |    |  Binary Label   | |
                       |    | (including      | |
                       |    |  high bit on)   | |
                       |    |_________________| |
                       |    |                 | |
                       |    | Bit String      | |
                       |    |   Label         | |
                       |    |_________________| |
                       |________________________|


           (5) To IDNA-unaware applications, U-labels are
                  indistinguishable from Binary ones.


               Figure 1: IDNA and Related DNS Terminology Space


  ==================




  As I have understood the WG charter, the intention has been 
  to devise a means to avoid specific dependence of the 
  specifications on any particular instance of the Unicode 
  character set. The general posture of the IDNA2008 document 
  set has also attempted to maintain a one-to-one relationship 
  between labels produced by the Punycode encoding algorithm and 
  the associated Unicode string. In brief terms, the A-Labels and 
  U-Labels of the IDNA2008 can be mapped back and forth without 
  any loss or change in the respective A-label or U-label strings. 


  Document editors are working to incorporate these new definitions 
  and the sense of exchanges on the mailing list. 


  As of this writing, it is my understanding that the Esszet and
  Final Sigma characters are to be treated as protocol-valid and
  that registries (in the most general sense of the word) are
  prepared to deal with the side-effects of prior registrations
  following the IDNA2003 guidelines.


  The current version of Tables rules out the use of Hangul Jamos
  per the recommendation of Korean language experts. 


  There remains further discussion and resolution of the use
  of Indic digits particularly in connection with the BiDi
  specifications. 


  There has also been some discussion about mapping on the list.
  The "going-in" assumption has been that the IDNA2008 
  specifications do not consider formalizing mappings. Some
  mappings may occur for local reasons prior to look up or
  registration of labels in domain names. Concern has been
  raised that if mappings are not standardized and uniform
  some surprises may ensue.


  We may need to discuss whether some form of standardized
  mapping is needed, possibly to maintain least surprise
  for users accustomed to the behavior of non-IDNA 
  domain names (e.g. upper/lower case equivalence
  for lookup purposes). 

  How ever this discussion ends up, there appears to be some
  consensus that the registration process should not, in and
  of itself, involve mapping. That is: only valid U-labels
  or A-labels should be presented to the DNS system for entry
  into the DNS zone files. 


  An assumption is made in the present specifications that
  any registered domain label derived from non-ASCII Unicode
  characters will be one-to-one convertible to A-label form
  from the Unicode form (U-label form) and vice-versa. 




  I believe we will have on the agenda several items:


  1. review of the then current status of the WG documents
     and any then known unresolved questions or issues
  2. Consideration of Paul Hoffman's alternative proposal
     to extend IDNA2003 
  3. Discussion of the role of mapping from the IDNA2008
     perspective. 




  I will prepare a more precise agenda along with issues to
  be discussed and resolved as the time approaches for our
  meeting in March. 


  Vint


  Vint Cerf
  Google
  1818 Library Street, Suite 400
  Reston, VA 20190
  202-370-5637
  vint at google.com



  Vint Cerf
  Google
  1818 Library Street, Suite 400
  Reston, VA 20190
  202-370-5637
  vint at google.com









------------------------------------------------------------------------------


  _______________________________________________
  Idna-update mailing list
  Idna-update at alvestrand.no
  http://www.alvestrand.no/mailman/listinfo/idna-update
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090217/c01a2582/attachment-0001.htm 


More information about the Idna-update mailing list