AW: AW: sharp s (Eszett)

John C Klensin klensin at jck.com
Tue Mar 11 21:46:54 CET 2008



--On Tuesday, 11 March, 2008 12:12 -0700 Mark Davis
<mark.davis at icu-project.org> wrote:

> It is not just a matter of "typographic convenience": the
> recognized standard German uppercase of "ß" *is* "SS".
> Unicode did not invent this relationship -- it is just
> following recognized German standards. In German orthography
> ß is not just an ordinary letter like any other. If in normal
> use ß were caseless, or if ß had a unique uppercase, we
> wouldn't be having this discussion. But it is not normal. And
> the previous behavior in IDNA2003 can't be simply discarded.
> There are two main issues:
> 
> *1. IDNA compatibility. *Right now, all of the following point
> to the same website. If we make this exception for ß, then
> they won't.
> 
> http://FASS.de
> http://Faß.de
> http://fass.de
> 
> This is not just a UI issue, since the URLs above can be in
> all sorts of data (email, webpages, etc). And even if IDNA200x
> comes out soon, data and programs exist, that will only slowly
> be updated. So for an extended, perhaps indefinite, amount of
> time browsers and search engines (like ours at Google) will
> need to handle both IDNA2003 and IDNA200x URLs. When the
> results under each system point to different places, that is a
> significant problem and possible security issue.
> 
> *2. Case insensitivity. *If we make this exception, then
> uppercasing a domain name causes it to go to a different
> place. Even if there were no compatibility issue, there is
> still the issue of whether it is more important to have ß or
> to have case-insensitivity.
> 
> 
> While it would be possible to have an exception for ß, both
> of these issues need to be considered very carefully, and we
> should not make any decision lightly. Any proposal for an
> exception for ß really should get consensus from a broad set
> of stakeholders, including DENIC, NIC.AT, and SWITCH, as well
> as the standards bodies DIN, ÖN, and SNV.

Mark,

I think I understand all of these issues.   I tried to write my
note very carefully, but obviously it was not careful enough.
The bottom line, regardless of what terminology and
classifications we use, there is a belief, backed up by various
authorities, language (orthography) reform documents, etc., that
Eszett is a real character that should not be mapped, converted,
of folded into something else because doing so leads to loss of
information.

While it is clear that it would be easy to say "we have to stick
with Unicode casefolding rules" or "compatibility with IDNA2003
is more important than anything else", I'm not comfortable
telling the users that we've decided that they don't get to use
this character because it is inconvenient.    They --together
with their registries-- should get to make that choice.  

Clearly, if the registries don't like it, they can (and should)
refuse to register the character.   And, if they ask my advice,
they will permit registrations with it only using variant
techniques that avoid issues with existing registrations that
use "ss".    

And, yes, like any other change, it would be easier, from a
technology standpoint, to not make it.  But that position seems
a little short-sighted to me.

     john



More information about the Idna-update mailing list