draft-klensin-idnabis-protocol-04 section 4.5

Harald Alvestrand harald at alvestrand.no
Thu Mar 27 13:37:38 CET 2008

Simon Josefsson wrote:
> Harald Tveit Alvestrand <harald at alvestrand.no> writes:
>> --On Thursday, March 27, 2008 10:55:04 +0100 Simon Josefsson
>> <simon at josefsson.org> wrote:
>>> This section reads:
>>>    The resulting U-label is converted to an A-label (i.e., the encoding
>>>    of that label according to the Punycode algorithm [RFC3492] with the
>>>    prefix included, i.e., the "xn--..." form).
>>> That assumes that no U-label will be translated into a LDH-label.
>>> In IDNA2003 some U-labels are translated to LDH-labels, for example:
>>> ToASCII(josefßon) = josefsson
>>> ToASCII(dªtªkonsult) = datakonsult
>>> Note absence of xn-- prefix and punycode data.
>>> Is the intention that these strings will not map to the same LDH label
>>> in IDNA200x?
>> As long as we keep the "no mapping" principle, the intention is that
>> these strings will be rejected by IDNA200x.
> Rejected at registration?  Or rejected during lookup?
 From draft-klensin-idnabis-protocol-04:


4.3.  Permitted Character and Label Validation

4.3.1.  Rejection of Characters that are not Permitted

   The Unicode string is examined to prohibit characters that IDNA does
   not permit in input.  Those characters are identified in the
   "DISALLOWED" and "UNASSIGNED" lists that are discussed in
   [IDNA200X-Rationale].  The normative rules for producing that list
   and the initial version of it are specified in [IDNA200X-Tables].
   Characters that are either DISALLOWED or UNASSIGNED MUST NOT be part
   of labels being processed for registration in the DNS.


5.4.  Validation and Character List Testing

   In parallel with the registration procedure, the Unicode string is
   checked to verify that all characters that appear in it are valid for
   IDNA resolution input.  As discussed in [IDNA200X-Rationale], the
   resolution check is more liberal than that of the registration one.
   Putative labels with any of the following characteristics MUST BE
   rejected prior to DNS lookup:

   o  Labels containing code points that are unassigned in the version
      of Unicode being used by the application, i.e., in the
      "Unassigned" Unicode category or the UNASSIGNED category of

   o  Labels that are not in NFC form.

   o  Labels containing prohibited code points, i.e., those that are
      assigned to the "DISALLOWED" category in the permitted character
      table [IDNA200X-Tables].

   o  Labels containing code points that are shown in the permitted
      character table as requiring a contextual rule and that are
      flagged as requiring exceptional special processing on lookup
      ("CONTEXTJ" in the Tables) MUST conform to the rule, which MUST be

   o  Labels containing other code points that are shown in the
      permitted character table as requiring a contextual rule
      ("CONTEXTO" in the tables), but for which no such rule appears in
      the table of rules.  With the exception in the rule immediately
      above, applications resolving DNS names or carrying out equivalent
      operations are not required to test contextual rules, only to
      verify that a rule exists.

   o  Labels whose first character is a combining mark. [[anchor15: Note
      in Draft: this definition may need to be further tightened.]]

.... more text follows .....

> What I'm trying to understand is what an IDNA200x implementation will do
> (i.e., which output string or what error) when the user types 'josefßon'
> or 'dªtªkonsult'.

Read the drafts. It helps.


More information about the Idna-update mailing list