Status of IDNABIS Working Group

Vint Cerf vint at google.com
Mon Feb 16 22:00:19 CET 2009


NB: THIS TEXT MUST BE READ WITH A FIXED WIDTH
COURIER FONT FOR THE ILLUSTRATIONS TO LINE UP PROPERLY:

A fair amount of work is underway to improve the clarity
of the Definitions and Rationale documents and to revise
the others  as needed to take into account proposed new
terminology. The intent is to have as much of this work
as possible available for WG review in time for the March
IETF in San Francisco. Two sessions have been reserved
during the week: one on Monday, March 23 and one on
Tuesday, March 24.

At that meeting we will also want to take up a comparison
of the documents that reflect the work outlined in the
charter and the recent proposal made by Paul Hoffman for
an alternative to that approach.

The revision work takes up the following tersely rendered
set of definitions (it will be best to read the revised
Definitions document when released for a more complete
picture).

The text below is intended to convey the flavor of the
attempt to clarify definitions but is not the entire
text that is in preparation.

2.3.  Terminology Specific to IDNA

    This section defines some terminology to reduce
dependence on term and definitions that have been
problematic in the past.

An LDH-Label is a string consisting solely of ASCII
upper and/or lower case letters, digits 0-9 and the hyphen
("-"). These labels are limited to 63 characters and do
not include a hyphen at either the beginning or end of
the string. Some people might call this a "traditional
host name" label.

A new subset of LDH-Labels is defined that have the
property that they all have a sequence of ASCII hyphens
in the third and fourth character position from the
beginning of the label. Roughly, in left-to-right form
this would read "??--" where "??" is drawn from the
traditional LDH set of characters, except that the first
"?" cannot be a hyphen by definition of LDH-label nor can
the last character of the label be a hyphen. This subset of
LDH-labels is named R-LDH-labels for "reserved LDH-Labels.
Labels that are NOT members of the R-LDH-label category are
called the Non-Reserved-Labels or NR-LDH-Labels and they
make up the remainder of the LDH-label universe.

This distinction among possible LDH labels is only has
significance for software that is "IDNA-aware". Otherwise,
all LDH-labels meeting the definition above are accepted as
valid by non-IDNA aware software.

As it happens, only a subset of the R-LDH-labels can
potentially be used in IDN-aware applications, specifically
the class of labels that begin with the prefix ("xn--")
[what about "XN--"?].

This class we call "XN-labels". Of this class, only a
subset of these that we will call "A-labels" are valid
for use in IDNA-aware applications, namely the subset
that is valid Punycode output limited to 59 characters
in addition to the "xn--" prefix and which can be converted
into valid Unicode characters by a reverse algorithm
(cf RFC3492). Valid Unicode characters are defined by
conformance to the Protocol, Table and BiDi  documents
that identify which Unicode characters can be used in
IDNA2008-aware applications.

There is also a class of labels that are prefixed with "xn--"
but whose remaining characters cannot be converted into
valid Unicode, or cannot be produced using the Punycode
encoding algorithm or that otherwise do not meet the A-label
criteria. These we will refer to as Invalid-A-labels
[or something like that].

The R-LDH-labels that are neither A-labels nor
invalid-A-labels are reserved and not permitted to be
used in IDNA2008-aware applications.

Labels that satisfy the LDH-Label criteria but that are
not Reserved-LDH Labels are called Non-Reserved LDH labels
or NR-LDH-labels.


FOR IDN2008-AWARE SYSTEMS, VALID LABELS INCLUDE:

A-LABELS, U-LABELS AND NR-LDH LABELS.

IDNA-LABELS COME IN TWO FLAVORS: AN ACE-ENCODED FORM AND A UNICODE FORM.
THESE ARE REFERRED TO AS A-LABELS AND U-LABELS RESPECTIVELY.


                                ASCII-LABEL
----------------------------------------------------------------
|                                                              |
|                 LDH-LABEL (1) (4)                            |
|          ___________________________________________________ |
|         |                                                  | |
|         |                                                  | |
|         |  __________________________________              | |
|         |  |IDN Reserved LDH Labels          |             | |
|         |  | ("??--")   or R-LDH LABELS      |             | |
|         |  |                                 | NONRESERVED | |
|         |  | ------------------------------- |  LDH LABELS | |
|         |  | |       XN LABELS             | |             | |
|         |  | | _____________   ___________ | |             | |
|         |  | | |           |   |          || |NR-LDH LABELS| |
|         |  | | | A-labels  |   | Invalid  || |             | |
|         |  | | | "xn--"(2) |   | A-labels || |             | |
|         |  | | |___________|   |____(3)___|| |             | |
|         |  | |_____________________________| |             | |
|         |  |_________________________________|             | |
|         |__________________________________________________| |
|                                                              |
|                                                              |
|            NON-LDH-LABEL                                     |
|         _______________________________________________      |
|         |                                             |      |
|         |         ________________________            |      |
|         |         | SRV & SRV-LIKE       |            |      |
|         |         | e,g, _tcp            |            |      |
|         |         |______________________|            |      |
|         |         ________________________            |      |
|         |         | leading or trailing  |            |      |
|         |         | hyphens "-abcd"      |            |      |
|         |         | or "xyz-" or "-uvw-" |            |      |
|         |         |______________________|            |      |
|         |         ________________________            |      |
|         |         | Other non-LDH        |            |      |
|         |         | ASCII Chars          |            |      |
|         |         | e.g. #$%&_           |            |      |
|         |         |______________________|            |      |
|         |_____________________________________________|      |
|______________________________________________________________|


           (1) ASCII letters (upper and lower case), digits,
              hyphen.  Hyphen may not appear in first or last
              position. Less than 64 characters.
           (2) Note that the string following "xn--" must
              be the valid output of the Punycode algorithm
              and must be convertible into valid U-label form.
           (3) Note that an Invalid-A-Label has a prefix "xn--"
              but the remainder of the label is NOT the valid
              output of the Punycode algorithm.

           (4) LDH-LABEL subtypes are indistinguishable to IDNA-unaware
                 applications.



                      __________________________
                      |  Non-ASCII             |
                      |                        |
                      |    ___________________ |
                      |    | U-label (5)     | |
                      |    |_________________| |
                      |    |                 | |
                      |    |  Binary Label   | |
                      |    | (including      | |
                      |    |  high bit on)   | |
                      |    |_________________| |
                      |    |                 | |
                      |    | Bit String      | |
                      |    |   Label         | |
                      |    |_________________| |
                      |________________________|

          (5) To IDNA-unaware applications, U-labels are
                 indistinguishable from Binary ones.

              Figure 1: IDNA and Related DNS Terminology Space

==================


As I have understood the WG charter, the intention has been
to devise a means to avoid specific dependence of the
specifications on any particular instance of the Unicode
character set. The general posture of the IDNA2008 document
set has also attempted to maintain a one-to-one relationship
between labels produced by the Punycode encoding algorithm and
the associated Unicode string. In brief terms, the A-Labels and
U-Labels of the IDNA2008 can be mapped back and forth without
any loss or change in the respective A-label or U-label strings.

Document editors are working to incorporate these new definitions
and the sense of exchanges on the mailing list.

As of this writing, it is my understanding that the Esszet and
Final Sigma characters are to be treated as protocol-valid and
that registries (in the most general sense of the word) are
prepared to deal with the side-effects of prior registrations
following the IDNA2003 guidelines.

The current version of Tables rules out the use of Hangul Jamos
per the recommendation of Korean language experts.

There remains further discussion and resolution of the use
of Indic digits particularly in connection with the BiDi
specifications.

There has also been some discussion about mapping on the list.
The "going-in" assumption has been that the IDNA2008
specifications do not consider formalizing mappings. Some
mappings may occur for local reasons prior to look up or
registration of labels in domain names. Concern has been
raised that if mappings are not standardized and uniform
some surprises may ensue.

We may need to discuss whether some form of standardized
mapping is needed, possibly to maintain least surprise
for users accustomed to the behavior of non-IDNA
domain names (e.g. upper/lower case equivalence
for lookup purposes).

How ever this discussion ends up, there appears to be some
consensus that the registration process should not, in and
of itself, involve mapping. That is: only valid U-labels
or A-labels should be presented to the DNS system for entry
into the DNS zone files.

An assumption is made in the present specifications that
any registered domain label derived from non-ASCII Unicode
characters will be one-to-one convertible to A-label form
from the Unicode form (U-label form) and vice-versa.


I believe we will have on the agenda several items:

1. review of the then current status of the WG documents
    and any then known unresolved questions or issues
2. Consideration of Paul Hoffman's alternative proposal
    to extend IDNA2003
3. Discussion of the role of mapping from the IDNA2008
    perspective.


I will prepare a more precise agenda along with issues to
be discussed and resolved as the time approaches for our
meeting in March.

Vint

Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com


Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090216/b1182136/attachment-0001.htm 


More information about the Idna-update mailing list