Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Mon Jan 28 16:22:59 CET 2008

Patrik,

I think this conclusion is correct.  A few comments below.

--On Monday, 28 January, 2008 08:52 +0100 Patrik Fältström
<patrik at frobbit.se> wrote:

> The tables document explain what codepoints can be in a
> U-label. After reading what all of you have written, I see
> three different suggestions:
> 
> (1) Keep final sigma as it is today, NEVER, as casefold(final
> sigma) != final sigma
> (2) Have final sigma as an exception, CONTEXT
> (3) Have final sigma as an exception, MAYBE NO
> 
> I have read the email on this list, and my proposal for
> conclusion of consensus is the following:
> 
> Given that some people (and the Unicode Standard) say final
> sigma in some context might be mapped to sigma (casefolding,
> context dependent etc) it would be pretty bad if someone
> actually register a domain name with final sigma. This because
> people that use clients that "happen" to (based on context or
> whatever else) map this to sigma will not get a match when
> looking up the domain name.

This does not change or question your conclusion, but TUS says
that casefolding is so destructive that it should be used only
for comparisions and that original strings should be retained
for all other purposes.  Of course, we couldn't use that
information in IDNs without server-side changes in matching.  I
just realized that there were ways to do it on the server that
would not require the server to have casemap capability, but it
isn't pretty.  See below.

> Because of this, and the fact I really want to minimize the
> amount of exceptions, I find the conclusion is that final
> sigma should stay as a non-exception, i.e. alternative (1)
> above, which imply it will be in NEVER and because of that not
> allowed to be registered in DNS.
> 
> That said, any preprocessing, user interface etc, can of
> course allow final sigma and map it to something that is
> appropriate according to whatever application, context, locale
> or such. Rules that are impossible to implement in the global
> DNS.
> 
> Next version of the tables document will because of that NOT
> say anything special about final sigma.

I've added a few new sentences to issues-06g to explain the
relationships that cause this and a new subsection to 9.3
(question of prefix changes) that explains just how painful a
prefix change would be.  The latter, from which economic and
performance costs could be estimated (although I am not going to
do it), may be helpful in heading off the Cyprus crisis that
Cary is anticipating or at least pitting every existing and
potential registrant of an IDN against those who would like to
make this change.

Note that, while Ken's comments and casemap claim that final
sigma is the only Latin-script case, as far as I can tell the
situation with Eszett is, for us and in practical terms,
identical.  Hence the use of the plural to describes these cases
in the text.

>      Patrik
> 
> P.S. I have though found some more bugs in my script(s) that
> generate the non-normative tables in the tables document. I
> have because of that now falled back from using my own code to
> use the Unicode libraries in perl. If people know about any
> problems with that, let me know.

It occurs to me that, once we get the protocol work stabilized
-- i.e., at least into Last Call and maybe past it -- it would
be useful to produce a document that describes the options that
could have been taken (other than IDNA as we know it) and that
explains the tradeoffs and why those options were not taken.
If we don't do that, we are, if fear, in for a long future of
second-guessing (e.g., the "let's junk IDNA and go for separate
classes" discussion at IGF) and local solutions that attempt to
bypass IDNA.  Not a small job but one that I fear is necessary.

    john