AW: sharp s (Eszett)

Jelte Jansen jelte at NLnetLabs.nl
Mon Mar 10 18:53:02 CET 2008


John C Klensin wrote:
> 
>> As I said before the opposite is the case,
>> changing sharp s into "ss" and vice verse can even lead to
>> words with different meanings. At least it will lead to wrong
>> spelling, and I hope nobody working on standards finds this a
>> trifle.
> 
> Understood.  And I certainly do not.   We should all remember,
> however, that there has never been a firm commitment that the
> DNS could represent all relevant "words" (in either the ASCII or
> IDN forms).   Some things are just not going to be possible to
> distinguish.  Eszett is not one of them -- as I've said before,
> we know how to support it although, for backward-compatibility
> reasons, doing that now is likely to be somewhat painful.
> 

Preventing bad spelling is not a goal of the protocol in itself (google
is a misspelling too). But that's beside the point (I understand that
the problem is changing correct to bad spelling). Even if the sharp s
was represented on the wire by 'ss', it would still be represented back
to the user as a sharp s, if that is what they've entered. But indeed,
if someone types 'ss' then the bad spelling would be reflected back. And
so would getting the data through other means, where the symbol isn't
entered at all.

But basically, if I understand this right, the problem here is the old
rule 'when the sharp s symbol is not available, use ss'. Which, even if
it is wrong, is what we were taught in school. You may blame the dutch
school system in my case, but whether or not it's true, it's apparently
engraved in more minds. And I'm guessing the source of the lowercasing
to 'ss' in CaseFold.txt. Which would be weird, since by definition the
sharp s is available. Please correct me if I'm wrong here.

So if IDNAbis would make an exception for the sharp S, and allow it as
a separate symbol, would there be people running to their lawyers
because they think it's equivalent to ss too (even besides the backwards
compatibility problem)? And wouldn't every other protocol that needs
normalization need this exception? In which case it would probably be a
better mission to try and get the casefolding entry out of Unicode.

Jelte

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20080310/ba8d49ea/signature.bin


More information about the Idna-update mailing list