Eszett (Sharp-S) again (was: Re: AW: Oustanding issues tracking)

John C Klensin klensin at jck.com
Wed May 28 00:02:55 CEST 2008



--On Tuesday, 27 May, 2008 16:52 +0200 Patrik Fältström
<patrik at frobbit.se> wrote:

> On 27 maj 2008, at 15.43, Georg Ochsner wrote:
> 
>> - unresolved: SHARP S
>> 
>> I would like to mention the open "sharp s" issue here.
>> 
>> Can we please keep this in mind and/or on the TODO list.
> 
> So far the conclusion I have seen is that "sharp s" is
> possible to enter according to IDNA2003 due to the mapping
> features that does not exist in IDNA2008. There are proposals
> for creating mapping to the IDNA2008 standard, but nothing
> firm yet (links to Mark's proposals have been passed around).
> 
> Sharp s has not been possible to register in DNS in IDNA2003.
> 
> And there is nothing else than mapping that "makes sense" for
> these reasons (and more) in IDNA2008.
> 
> Because of this (and the fact no text is proposed for the
> tables document), I see no consensus of changing anything in
> the IDNA2008 tables document.

Patrik,

While I agree with everything you have written above, I suggest
the problem with Sharp-S isn't suggesting text for "tables".
That text would simply be to add it as an exception with value
PVALID.  One-line change.

The problem is that permitting it involves a set of very
difficult tradeoffs:

(1) No matter what mapping is done, we cannot have both Eszett
as a distinct character and Eszett matching or mapping to "ss".
If one believes in correct German orthography (as in Germany),
that is exactly correct -- they are not the same.   If one
believes in correct orthography in some other places, or in
common usage even in parts of Germany, they ought to match.

(2) If we treat Eszett as a separate character, we create a
fairly nasty incompatibility between IDNA2003, where it maps to
"ss" and disappears and IDNA2008, where it is separate.
Preventing that incompatibility from being a nasty problem would
require some very careful action by the relevant registries (as
usual, in the "zone administrator" sense, not just TLD
registries).  Possibly a well-designed variant strategy would be
sufficient, possibly not.   We haven't heard from the most
obvious registries yet as to whether they would be willing to
deal with this.

(3) Treating it as a separate character creates an
incompatibility with Unicode's case folding rules/ algorithm.
With mapping out of the protocol, I don't think that is a
significant issue, but I could be wrong.   If one used a
standardized mapping outside the protocol, one would have to be
careful that the mapping process didn't cause problems by
eliminating the character before it got to IDNA.

So the problem, IMO, isn't "how to implement" (which I could
infer from your note) but "how to decide".

     john



More information about the Idna-update mailing list