Making progress on the mapping question

Mark Davis mark at macchiato.com
Tue Mar 31 01:31:55 CEST 2009


I think that is overly complicated. The consensus is to have a required
mapping step for compatibility in Lookup (and forbid mapping in
Registration). Moreover, we have the strong statement from DENIC that (a)
they prefer mapping for compatibility, and (b) if there is a mapping, then
they want the mapping of eszett to be in accordance with IDNA2003. And I
believe (although it is not completely clear), that the Greeks feel the same
about sigma.

Given that, the only changes in the structure of the mapping of IDNA2003 is
to allow ZWJ/ZWNJ. We can fit these pieces together much more simply, and
without even lugging around an IDNA2003 implementation (which we'd like to
get rid of), or lugging around a Unicode 3.2 implementation. Moreover, two
lookups are *only* required when a domain name contains at least one ZWJ or
ZWNJ.


Here is a proposal on that basis:

http://tools.ietf.org/html/draft-ietf-idnabis-protocol-11

In 4.1, replace:

  Entities responsible for
  zone files ("registries") are expected to accept only the exact
  string for which registration is requested, free of any mappings or
  local adjustments.  They SHOULD avoid any possible ambiguity by
  accepting registrations only for A-labels, possibly paired with the
  relevant U-labels so that they can verify the correspondence.

by
  Entities responsible for
  zone files ("registries") MUST only accept only U-Labels or A-Labels.
 They SHOULD avoid any possible ambiguity by
  accepting registrations only for A-labels, preferably paired with the
  corresponding U-labels so that registrants can verify the identify of the
labels.

Replace Section 5.3 by

5.3. Character Transformations

The Unicode string MUST be transformed according to the specifications in
5.3.1 and 5.3.2. These transformations are designed to allow for mapping
compatibility with IDNA2003 without requiring Unicode 3.2 implementations.
Even with these transformations, however, there are many characters that are
allowed in IDNA2003 that are not allowed by IDNA2008. Implementations may
use one of the techniques described in Appendix A to deal handle such domain
names during a transitional period.

It is important to note that labels in application protocols, files, or
links SHOULD BE in U-label or A-label form.


5.3.1 Normalization and Casefolding

The Unicode string MUST be transformed by normalizing with Unicode
normalization form KC, then case folding, then normalizing again. This
guarantees that none of the resulting characters in the string are Unstable
according to the criterion in Tables Section 2.2. Unstable (B). In
pseudocode:

  string = toNFKC(toCaseFold(toNFKC(string)));

Example: <A, U+0300 COMBINING GRAVE ACCENT> is transformed into <U+00C0 ( À
) LATIN CAPITAL LETTER A WITH GRAVE>.


5.3.2 Removal of Ignorables

Certain Unicode characters are called Default_Ignorable_Code_Points. For
more information, see Tables Section 2.3. IgnorableProperties (C). The
Unicode string MUST be transformed by removing all
Default_Ignorable_Code_Points characters except for the Join Controls
specified in Tables Section 2.8. JoinControl (H). In pseudocode:

  string = removeAll(string, Default_Ignorable_Code_Point - Join_Control)

Example: <A, U+00AD (  ) SOFT HYPHEN, B> is transformed into <A, B>.

Replace Appendix A by

Appendix A. Transitional Techniques

Registries should support IDNA2008 as soon as possible, and no longer
support registration of any labels that are only valid in IDNA2003. In
Lookup, on the other hand, many implementations will need to provide
backwards compatibility for IDNA2003 labels during some transitional period.
These IDNA2003 labels will typically contain a symbol or punctuation mark
that is not allowed under IDNA2008, such as "I<heart>NY".

The following describes a technique for modifying the lookup process to deal
with that situation. There are two cases to be handled, according to whether
the labels in the domain name pass the tests of Section 5. Note that two
lookups are *only* required when a domain name contains at least one ZWJ or
ZWNJ.

Case 1. The labels pass the tests of Section 5 (typically all M-Labels)

   - Perform the lookup with the corresponding XN-Labels
   - If it fails, and it contains any ZWJs or ZWNJs, remove them and perform
   the lookup with the result.
   - If that fails, stop with an error.

Case 2. The labels don't all pass the tests of Section 5 (typically at least
one non-M-Label)

   - Transform each label according to Section 5.3. In addition, remove any
   ZWJs or ZWNJs.
   - If the string contains any unassigned Unicode characters, stop with an
   error.
   - If the corresponding XN-Label contains any characters prohibited by
   IDNA2003 (http://www.ietf.org/rfc/rfc3454.txt Section C. Prohibition
   tables), stop with an error.
   - Perform the lookup with that XN-Label.

The conditions in Case 2 are slightly different than for IDNA2003, but avoid
having to retain a complete IDNA2003 implementation: only a small table of
prohibited characters needs to be retained. Alternatively, an IDNA2003
implementation can be used in a modified Case 2.

[Ed note: the reason I have the somewhat clumsy language
"typically...M-Labels" is that we don't guarantee that what results from
Section 5 (Lookup) is actually a M-Label, because Section 5.3 doesn't
guarantee U-Labels.]


Add to Defs:
An M-Labels is a Unicode String whose transformation according to section
5.3 of Protocol results in a U-Labels.

Mark



On Mon, Mar 30, 2009 at 04:41, Vint Cerf <vint at google.com> wrote:
> There has not been any significant objection to the proposals made
> during the IETF 74 meeting to apply some form of mapping during
> lookup. The two questions outstanding are:
>
> 1. what mapping function should be used?
> 2. how should it be used
>
> As Harald and others have observed, if it is applied before an
> IDNA2008-style lookup, we will not find new characters permitted under
> IDNA2008 if they happen to be mapped under IDNA2003. This seems to
> argue for:
>
> 1. first look up under IDNA2008 rules
> 2. If a domain name is found, return the corresponding results
> 3. If a domain name is not fund, apply IDNA2003 mapping
> 4. If a domain name is found, return the results
> 5. If a domain name is not found, report that no such domain name exists
>
> One final point. It seems to me that we should put the IDNA2003
> mapping function into stasis, making no future changes to it, and use
> the IDNA2008 framework to accommodate any new additions into Unicode
> versions as they are released. Assuming we have ample warning of a new
> version, we can even prepare tables suited to the new release ahead of
> time so as to have them available at the point where a new version of
> Unicode is adopted.
>
> Could the WG please analyze this proposition, point out flaws and
> suggested corrections for them?
>
> thanks
>
> vint
>
>
> Vint Cerf
> Google
> 1818 Library Street, Suite 400
> Reston, VA 20190
> 202-370-5637
> vint at google.com
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090330/05b98cd6/attachment.htm 


More information about the Idna-update mailing list