idna folding (was Re: idna-bis and '?')

Mark Davis mark.davis at
Mon Dec 17 22:47:44 CET 2007

I'm a bit lost.

> Unicode say ö and ø are different, but that is definitely not what
> people in Norway or Sweden (or Denmark for that matter) think.

It feels like most people are in rough consensus as to the following points.

   1. People often consider words with different spellings to be the same
   word, or at least equivalent.
   2. There are many, many examples of this:
      1. telephone and telefon; Duerst and Dürst; Torbjørn and
      Torbjörn; Mark and Marc; Teri, Terry, Terri; 中國 "China" (traditional), 中国
      "China" (simplified), and so on.
      2. [For those without UTF-8 mailers]
      telephone and telefon; Duerst and D\u00FCrst; Torbj\u00F8rn and
      Torbj\u00F6rn; Mark and Marc; Teri, Terry, Terri; \u4E2D\u570B "China"
      (traditional), \u4E2D\u56FD "China" (simplified), and so on.
      3. These equivalences are very language-dependent: two words
   considered equivalent in one language many not be considered equivalent in
   other languages, or even in two different orthographies for the same
   4. Normalizing or matching these kinds of differences in spellings are
   outside the scope of IDNA, although country-specific registries might want
   to take them into account when considering issues such as bundling of domain

If this is not the case, could someone say where they disagree with one or
more of the above points?

On the other hand, case and width folding is very different. For *
lowercasing* (case folding) there is very little variation. The chief
standout is the Turkish i. Yet you really don't want different processes
lowercasing differently. We don't want <href a="
to be interpreted as two different strings on two different browers:

   - çiçek = xn--iek-1lab on one system, and
   - çıçek = xn--ek-3iaa38a on another system

I think it's perfectly reasonable to have a standardized folding of IDN be
defined in a different RFC, but I would be concerned if it were missing.


On Dec 16, 2007 3:50 PM, Patrik Fältström <patrik at> wrote:

> On 17 dec 2007, at 00.09, Harald Tveit Alvestrand wrote:
> > --On 16. desember 2007 15:07 -0800 Erik van der Poel
> > <erikv at> wrote:
> >
> >> To me, this sounds as though one should not be mapped to the other at
> >> registration time, so I don't understand why people would be
> >> interested in treating them as the same codepoint at registration
> >> time.
> >
> > We must take care with terms here.... I believe both Torbjørn and
> > Torbjörn would be interested in a regime where registering
> > "torbjø <>" would not be allowed if "
> torbjö <>" was already
> > registered by someone else. But that's bundling, not mapping. And
> > the "default member" of the bundle (the one that actually goes into
> > the zonefile) would be different in Norway and Sweden.
> >
> > (btw, .no doesn't support bundling at this time. I don't believe .se
> > does either.)
> Correct, .SE does not.
> What everyone wants is that torbjø <> and
> torbjö <> and possibly
> torbjö <> and torbjø<>end up at the same resource in as few
> hoops to jump through as possible. Specifically it would be
> "interesting" if torbjö <> and torbjø<>end up having different
> domain name holders, because I have no idea what the dispute
> resolution process would say about it.
> Unicode say ö and ø are different, but that is definitely not what
> people in Norway or Sweden (or Denmark for that matter) think.
> On the other hand, I think people in Germany might think o and ö is
> the same (correct me if I am wrong here), something definitely not the
> case in Sweden. Here o and ö are different characters. ö is not o with
> diaeresis.
>    Patrik

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Idna-update mailing list