Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Tue Jan 22 20:05:54 CET 2008

On Tue, Jan 22, 2008 at 01:42:22PM -0500, John C Klensin wrote:
> Does that imply that you would prefer that Crankycanuck.ca not
> match crankycanuck.ca and that CrankyCanuck.ca should be banned
> entirely?  At least the first two not matching are a direct
> consequence of the statement Stephane made and the third is a
> corollary to some of the comments that have been made about the
> orthographic and typographic importance of final sigma.

I think perhaps I took Stephane's remark a little more narrowly. I
should have been less glib.  My apologies.

The thing I am keen to avoid is worrying about whether people are going
to blame the IETF for this or that shortcoming of the protocol,
whatever it is.

Just about every naive interpretation of labels and how they are to be
read or interpreted are frustratingly "obvious" to the interpreters, as
they will tell you at some vexing length if you let them, even though
they are often wrong.  I know that none of that is news to anyone here,
but it's worth remembering that many such people are among the target
users of whatever protocol emerges, and that the intuitions that derive
from individuals' own languages will therefore automatically lead to
tension.

So I am very keen to avoid implicitly endorsing anything that entails
semantics to labels, any more than we have already done (in, e.g., the
examples you mention.  And yes, I can imagine that the non-matches you
mention could have been settled upon at some point, although I can also
see how painful that might have been).  That's what I was agreeing
with.  Also,

[useful questions snipped]
> Because the answers to those questions affect the way strings
> are encoded into the DNS and/or how they are matched, I fail to
> see how they can be handled at other than a protocol level.

I agree with that.  The correct decision relates to be how things are
possibly encoded in the DNS.  The problem is essentially one of setting
some convention, whatever it is, to do that.  Worrying about whether
users of those encodings will be perfectly happy with the result is
important, but not important enough to open the door wide enough to
haul in natural-language semantics for labels.  I don't think the work
so far has tended to such semantics, but I was attempting to avoid what
looked to me like an impending rathole.  My apologies for having done
so badly.

A

Andrew Sullivan  | ajs at crankycanuck.ca
The plural of anecdote is not data.
		--Roger Brinner