AW: sharp s (Eszett)

Kenneth Whistler kenw at sybase.com
Tue Mar 11 01:01:49 CET 2008


John Klensi wrote:

> > The complication sets in when you have *non*-symmetric case
> > mappings, as for German sharp s:
> > 
> >    s --uc--> S,  S --lc--> s
> >    ß --uc--> SS, SS --lc--> ss
> >    ss --uc--> SS
> 
> But Ken, if I correctly understand what has been said on the
> list, and what Duden and other authorities say about German,
> were it not for fallback issues, there would be no relationship
> between Eszett and the "ss" sequence.    If that relationship
> did not exist, then the above would be

Too many hypotheticals. The relationship has existed and
still *does* exist.

http://de.wikipedia.org/wiki/Neue_deutsche_Rechtschreibung#Schereibung_von_ss_un
d_.C3.9F

  Wie auch in traditioneller Rechtschreibung wird ß durch ss
  ersetzt, wenn es im Zeichensatz nich vorhanden ist, ...
  
*That's* your fallback case... you write "ss" when you don't
have ß available.

  ... oder das ganze Wort in Großbuchstaben geschrieben ist.
  
*That's* your casing rule... you write "SS" when using all
capital letters for a word. Not a missing glyph in the font
(or typecase at the publisher), but just the way it is done.

  In der Schweiz wird ß nach wie vor nich verwendet, stattdessen
  immer ss geschrieben.
  
And *that's* the Swiss local preference in spelling.

  Die schon länger nicht mehr übliche Umschreibung mit SZ, zur
  Unterscheidung in Zweifelsfällen durchaus nützlich (vgl.
  etwa MASSE - MASZE), sieht die reformierte Schreibung nicht vor.
  
And *that* points out that the occasional practice of using
"SZ" as the all-caps form of ß, to distinguish it from
the all-caps form of ss, is *not* provided for in the
reformed German orthography. You just write "SS", regardless.

> >From that point of view, the problem here isn't with Eszett.  It
> is with the (quite natural, but possibly wrong in a few odd
> cases) that, if a script has case distinctions, all of its
> characters have case distinctions _and_ the imposition of a
> fallback on (or confusion of a fallback with) Eszett.

Well, sure... and people will continue to argue this. But
the facts of current German orthography are that the
uppercase of ß is "SS", despite minority holdouts for
maintaining "ß" in all-caps contexts or still using "SZ"
in all-caps contexts to try to preserve the distinction.
And no doubt there will be a third camp now agitating to
switch over to the uppercase-ß, instead.
 
> Change either of those things and ignore the introduction of an
> upper-case Eszett in 5.1, and one ends up with
>     
>    ß --uc--> ß   and
>    ß --lc--> ß

Those are, in fact, the *simple* case mappings for ß,
which are used when a case mapping is constrained to no
length changing of the strings. If anything, *those*
are the fallback forms for case mapping of ß, because
that is what you do if you cannot accomodate the case
mapping actually defined in standard German orthography.

> which looks a little strange but is perfectly natural and, I
> think, consistent with what happens when one applies uc or lc
> operations to characters that don't have case.
> 
> > For full casefolding, that creates an equivalence class
> > {ss, ß, SS}, and the "ss" is taken as the "folding" for
> > all elements of that class.
> 
> Only because of the introduction of the fallback into that
> model, unless I'm missing something.

See above.

> 
> > So this determination has nothing to do with fallback, per se,
> > but results from asymmetric case mapping.
> 
> Again, that is because someone decided to make ß --uc--> SS,
                         ^^^^^^^
                         
That "someone" consists of the official rules of the
neue deutsche Rechtschreibung, as far as I know. It isn't
just something made up by an engineer creating Unicode data
files.

> but that is basically a fallback for the absence of a character.

It is the result of applying an isomorphism (a case mapping)
to two sets which don't actually have completely isomorphic
repertoires. You end up with a 2 to 1 mapping somewhere that
results in a neutralization of a distinction.

You can claim that what you end up doing in the procrustean
circumstance of attempting an isomorphic transform on
non-isomorphic sets is a "fallback", but that is quite
different from the ordinary meaning of the term fallback
in a Unicode context of displaying text when you are missing
the correct font or glyphs to display it.

This discussion about ß tends to end up garbled because
using "ss" to display or represent ß *would* be a fallback
(see above), whereas using "SS" as the uppercase of ß is
*not* a fallback, but rather the prescribed, expected behavior.

> I am not suggesting that what is happening is wrong --that is a
> separate issue-- only that we are in this state of affairs
> because of a fallback situation, a case mapping (to upper case)
> that is historically reasonable according to some authorities
> and dubious according to others, and a casefolding operation
> that is defined in a specific way that almost certainly works
> for the vast majority of cases but that does not work perfectly
> for this one.

How does it not work for this [IDNA] context?

The net net is that ß is DISALLOWED. That means that
a registry will not (by protocol) allow registration of
a domain name with an ß in it.

That is no different than the current situation, right?

And as others have pointed out, nothing prevents the
user agent from mapping together {ß, ss, SS} before
handing a string off to a resolver, right? Which means
Herr Faßbinder can get his domain, and need not know
that what actually gets registered is the equivalent
fassbinder.

In fact, because of the history of German orthography, I
would argue that that is precisely what should be done.
In traditional orthography, many syllable-ending s's
were written with ß (Fluß, muß, Riß, Faß), but the new
orthography writes those consistently Fluss, muss, Riss, Fass.
That means you are going to get numerous instances where
someone might use one or the other, depending -- and
it makes no sense to then try to distinguish "ss" and
"ß" in domain name labels. That just raises the burden
at the other end, creating more need for bundling.

--Ken




More information about the Idna-update mailing list