sharp s (Eszett)

John C Klensin klensin at jck.com
Fri Mar 7 18:41:23 CET 2008


Georg,

I am pleased to see this position stated clearly by a native
German speaker.  Thanks for (again) pointing out the difference
between normal Swiss and German orthography and keyboards.  Your
terminology is, incidentally, just fine and your note is very
clear.

Let me see if I can quickly summarize the technical IDNA issues
with Eszett without taking any position on them.

	(1) IDNA2003 mapped Eszett into "ss".  That is no more,
	and no less, than a historical fact.   But it does imply
	that giving Eszett any other treatment going forward
	would create an incompatible change.  There are
	certainly users today, including some in Germany, who
	are taking advantage of the mapping, using Eszett in
	IRIs and other references but having registered domain
	names whose labels contain encoding of the "ss" form.
	To paraphrase the discussion Gerv and I are having about
	mappings, use of Eszett and the mapping obviously
	impressed those users/ registrants as the "least bad"
	alternative given what IDNA2003 does with the character.
	
	(2) It is worth noting, as part of the ongoing
	discussion about mapping (or not), that, had Eszett
	simply been rejected by IDNA2003 (rather than mapped),
	adding it now as a valid (and unmapped) character would
	be a simple matter.   With the behavior in IDNA2003, any
	change is an incompatible one.
	
	(3) In addition to the "no upper case form", the
	argument for making the mapping --and at least part of
	the argument that led to the mapping in IDNA2003-- is
	that, even though  everyone understands that some words
	containing "ss" cannot be mapped back into Eszett,
	"everyone" would expect the two to match. Again, that is
	a report about how we got here historically. I am not
	qualified to make a judgment about whether the statement
	is actually correct.  Arguably, neither is the IETF (see
	(5), below).
 	
	(4) There is no _technical_ problem with treating Eszett
	as a normal letter in IDNA200X as long as everyone
	understands that "no mapping" means "no matching with
	the 'ss' form" and we can live with the incompatible
	change.  You (and clearly some others) believe that is
	the right answer for German as written in Germany (and
	elsewhere).  Some others believe that it is the wrong
	answer for German as written in Switzerland (and
	elsewhere).   But there is no middle ground in which it
	can be a character in some places and a notation for
	"ss" in others.
	
	(5) The incompatibility problem is a significant one,
	since it would violate the implicit rule that a given
	label string that is valid under both IDNA2003 and the
	new proposals (known collectively as IDNA200X) must
	produce the same ACE (punycode-encoded) string.

The hard problem here is how the IETF can possibly decide on
this.  The default decision should almost certainly be "avoid
incompatibility", but that would leave you stuck with a decision
that was made early in the decade, possibly without adequate
information or consideration.  While it certainly isn't a matter
for "voting" or "collecting endorsements", I would think that
the IETF would find statements very helpful from the ccTLD
registries from German-speaking countries (and, ideally,
countries with large enough German-speaking populations to have
a lot of German-based registrations) about what they wanted to
do and how they would deal with the incompatibility problem
(e.g., by using "variant" techniques to be sure that a new
registration that included Eszett did not end up in different
hands from an existing registration that properly used the "ss"
alternate spelling) were the change made.

I believe that we can make some incompatible changes like this
(and like the addition of ZWJ and ZWNJ with contextual controls)
now if there is fairly strong consensus in the
materially-affected communities that the change is important
enough and that they are prepared to deal with it.   I also
think it is our last chance, so we had better get it right this
time.   Others may disagree with one or both of those beliefs.

thanks again,
       john




--On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
<g.ochsner at revolistic.com> wrote:

> Hello,
> 
> I am a native German speaker (born in Austria, living in
> Germany). I noticed that there have already been postings
> about the German sharp s (Eszett) but actually very few (if
> any) from German people (Afaik Martin is from Switzerland,
> where people normally do not use the sharp s). 
> 
> I want to stress how important the sharp s actually is for
> most of the German speaking users. Beside the 3 umlauts which
> can already be used in IDNs the sharp s is the 4th character
> which would really matter for users. Over 90 million German
> speakers do use the sharp s. In German texts it is used more
> often than the letters "j", "q" and "y" for instance. The
> sharp s has (of course) a direct key on German keyboards.
> 
> Concerning IDNA I have to say, that the sharp s is NOT equal
> to double s. Mapping the sharp s to "ss" is not natural from a
> user's point of view. If you substitute the sharp s by "ss"
> you will get wrong spelling in most cases and sometimes even
> other words with totally different meanings, which can be
> confusing. There are strict grammatical rules whether to use
> the one or the other.
> 
> I am not versed enough to know the deep technical impacts, but
> I am enthusiastic about the German language though... How
> could the sharp s be implemented into IDNA so that it can be
> used in IDNs? I read that the Latin capital sharp S has been
> added to Unicode 5.1 now
> (http://www.unicode.org/versions/Unicode5.1.0/). The document
> also proposes a tailored casing operation from small to
> capital sharp s where desired. What implications does that
> have on "rule B" in the current table document and the other
> documents?
> 
> As an user I would really like to see the sharp s in IDNs,
> maybe you can discuss the technical impacts, even if it takes
> kind of workarounds or "special" mappings...? As far as I can
> contribute by collecting orthographic data or contacting
> German language specialist here in Germany to join the
> discussion, please let me know and I will try.
> 
> Best regards
> Georg 
> 
> 
> PS.: Please forgive and correct me if I mixed up technical
> terms...
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update






More information about the Idna-update mailing list