Q2: What mapping function should be used in a revised IDNA2008 specification?

Mark Davis mark at macchiato.com
Wed Apr 1 02:17:20 CEST 2009


I believe that the simplest approach for compatibility, and for
implementability, is to use the one I gave in a previous email, as
replacement text for 5.3. That is, use the same structure as IDNA2003:
toNFKC(toCaseFold(toNFKC(x))), then remove all default-ignorable characters
but the joiners.

As I said, it appears that given the current consensus, we don't need to
specially except eszett and sigma, but should we decide that we really need
to preserve them, then we can do it in the following way.

   1. Find all the maximal substrings that do not contain the exceptional
   characters.
   2. Convert each of those substrings with the above mapping.
   3. Apply toNFC to the result

(This is a logical statement; the implementation can be optimized.)

For example, take the (artificial) string:
<B, full-width U, umlaut, eszett, e>

You would map <B, full-width U, umlaut> to <b, u-umlaut>, skip the eszett,
then map the <e> (no change). The result would be:
<b, u-umlaut, eszett, e>

Mark


On Tue, Mar 31, 2009 at 09:07, Vint Cerf <vint at google.com> wrote:

> What characters should be mapped into what other characters in a
> revised IDNA2008 specification?
>
> Can we describe succinctly and precisely what these mappings are? How?
> What should they be?
>
>
>
> Vint Cerf
> Google
> 1818 Library Street, Suite 400
> Reston, VA 20190
> 202-370-5637
> vint at google.com
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090331/a01b81c6/attachment.htm 


More information about the Idna-update mailing list