Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Erik van der Poel erikv at
Wed Jan 23 15:59:17 CET 2008

On Jan 22, 2008 10:44 PM, Martin Duerst <duerst at> wrote:
> I think the reason why this is excluded in IDNA2003
> (if it indeed is) is that it turns into a plain sigma
> when upper-cased and then lower-cased again. That's just
> a consequence of the rather roundabout way that the Unicode
> casing table was used in IDNA2003.

No, IDNA's sigma mappings come from Unicode's own CaseFolding.txt: section 3.2:

   Appendix B.3 is derived from the CaseFolding-3.txt file associated
   with Unicode 3.2; appendix B.2 is based on appendix B.3 with the
   additional characters added from the algorithm above.

IDNA uses B.2. See Unicode 3.2's
CaseFolding.txt says:


If we really, really wanted to, we could try to add final small sigma
to IDNA200X, but there would be a bit of an interoperability problem
during the transition period on the Web because major browsers like
MSIE 7 and Firefox 2 implement IDNA2003, which does not convert xn--
labels to Unicode if the resulting Unicode cannot be successfully
converted back to the same xn-- label. (The final small sigma would be
converted to a non-final small sigma, so the resulting Punycode would
be different.)

And it would not help so much to use a different prefix (other than
xn--) because that creates a transition period issue of its own.

An even worse interoperability problem would occur in V-labels (labels
containing code points that are mapped or normalized away by
IDNA2003). Old browsers would map final small sigma to non-final small
sigma and convert that to Punycode, while new browsers would leave the
final small sigma as is, and convert that to a different Punycode, so
it would go to a different site. Maybe the registry would bundle or
block this case, but it certainly is a consideration.



More information about the Idna-update mailing list