draft-klensin-idnabis-protocol-04 section 4.5
harald at alvestrand.no
Thu Mar 27 13:37:38 CET 2008
Simon Josefsson wrote:
> Harald Tveit Alvestrand <harald at alvestrand.no> writes:
>> --On Thursday, March 27, 2008 10:55:04 +0100 Simon Josefsson
>> <simon at josefsson.org> wrote:
>>> This section reads:
>>> The resulting U-label is converted to an A-label (i.e., the encoding
>>> of that label according to the Punycode algorithm [RFC3492] with the
>>> prefix included, i.e., the "xn--..." form).
>>> That assumes that no U-label will be translated into a LDH-label.
>>> In IDNA2003 some U-labels are translated to LDH-labels, for example:
>>> ToASCII(josefßon) = josefsson
>>> ToASCII(dªtªkonsult) = datakonsult
>>> Note absence of xn-- prefix and punycode data.
>>> Is the intention that these strings will not map to the same LDH label
>>> in IDNA200x?
>> As long as we keep the "no mapping" principle, the intention is that
>> these strings will be rejected by IDNA200x.
> Rejected at registration? Or rejected during lookup?
4.3. Permitted Character and Label Validation
4.3.1. Rejection of Characters that are not Permitted
The Unicode string is examined to prohibit characters that IDNA does
not permit in input. Those characters are identified in the
"DISALLOWED" and "UNASSIGNED" lists that are discussed in
[IDNA200X-Rationale]. The normative rules for producing that list
and the initial version of it are specified in [IDNA200X-Tables].
Characters that are either DISALLOWED or UNASSIGNED MUST NOT be part
of labels being processed for registration in the DNS.
5.4. Validation and Character List Testing
In parallel with the registration procedure, the Unicode string is
checked to verify that all characters that appear in it are valid for
IDNA resolution input. As discussed in [IDNA200X-Rationale], the
resolution check is more liberal than that of the registration one.
Putative labels with any of the following characteristics MUST BE
rejected prior to DNS lookup:
o Labels containing code points that are unassigned in the version
of Unicode being used by the application, i.e., in the
"Unassigned" Unicode category or the UNASSIGNED category of
o Labels that are not in NFC form.
o Labels containing prohibited code points, i.e., those that are
assigned to the "DISALLOWED" category in the permitted character
o Labels containing code points that are shown in the permitted
character table as requiring a contextual rule and that are
flagged as requiring exceptional special processing on lookup
("CONTEXTJ" in the Tables) MUST conform to the rule, which MUST be
o Labels containing other code points that are shown in the
permitted character table as requiring a contextual rule
("CONTEXTO" in the tables), but for which no such rule appears in
the table of rules. With the exception in the rule immediately
above, applications resolving DNS names or carrying out equivalent
operations are not required to test contextual rules, only to
verify that a rule exists.
o Labels whose first character is a combining mark. [[anchor15: Note
in Draft: this definition may need to be further tightened.]]
.... more text follows .....
> What I'm trying to understand is what an IDNA200x implementation will do
> (i.e., which output string or what error) when the user types 'josefßon'
> or 'dªtªkonsult'.
Read the drafts. It helps.
More information about the Idna-update