sharp s (Eszett)
mark.davis at icu-project.org
Sat Mar 8 00:01:06 CET 2008
The main reason for mapping ß to "ss" in IDNA2003 is for case insensitivity.
That is, so that "Ruß.com" matches "RUSS.COM", which in turn matches "
russ.com". A small side benefit is that some words that are spelled
differently by Swiss, or differently pre/post Spelling Reform also have the
same internal form, but that side benefit was really given no weight in the
original decision. Remember, also, that a great many words of every language
cannot be simply used as is with IDNA. One can't use many extremely common
English words; for example, "can't.com" won't work.
At this point, I really doubt that that the advantage of having ß outweighs
the cost of either incompatibility with IDNA2003, or the morass that would
be caused by a prefix change.
On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com> wrote:
> I am pleased to see this position stated clearly by a native
> German speaker. Thanks for (again) pointing out the difference
> between normal Swiss and German orthography and keyboards. Your
> terminology is, incidentally, just fine and your note is very
> Let me see if I can quickly summarize the technical IDNA issues
> with Eszett without taking any position on them.
> (1) IDNA2003 mapped Eszett into "ss". That is no more,
> and no less, than a historical fact. But it does imply
> that giving Eszett any other treatment going forward
> would create an incompatible change. There are
> certainly users today, including some in Germany, who
> are taking advantage of the mapping, using Eszett in
> IRIs and other references but having registered domain
> names whose labels contain encoding of the "ss" form.
> To paraphrase the discussion Gerv and I are having about
> mappings, use of Eszett and the mapping obviously
> impressed those users/ registrants as the "least bad"
> alternative given what IDNA2003 does with the character.
> (2) It is worth noting, as part of the ongoing
> discussion about mapping (or not), that, had Eszett
> simply been rejected by IDNA2003 (rather than mapped),
> adding it now as a valid (and unmapped) character would
> be a simple matter. With the behavior in IDNA2003, any
> change is an incompatible one.
> (3) In addition to the "no upper case form", the
> argument for making the mapping --and at least part of
> the argument that led to the mapping in IDNA2003-- is
> that, even though everyone understands that some words
> containing "ss" cannot be mapped back into Eszett,
> "everyone" would expect the two to match. Again, that is
> a report about how we got here historically. I am not
> qualified to make a judgment about whether the statement
> is actually correct. Arguably, neither is the IETF (see
> (5), below).
> (4) There is no _technical_ problem with treating Eszett
> as a normal letter in IDNA200X as long as everyone
> understands that "no mapping" means "no matching with
> the 'ss' form" and we can live with the incompatible
> change. You (and clearly some others) believe that is
> the right answer for German as written in Germany (and
> elsewhere). Some others believe that it is the wrong
> answer for German as written in Switzerland (and
> elsewhere). But there is no middle ground in which it
> can be a character in some places and a notation for
> "ss" in others.
> (5) The incompatibility problem is a significant one,
> since it would violate the implicit rule that a given
> label string that is valid under both IDNA2003 and the
> new proposals (known collectively as IDNA200X) must
> produce the same ACE (punycode-encoded) string.
> The hard problem here is how the IETF can possibly decide on
> this. The default decision should almost certainly be "avoid
> incompatibility", but that would leave you stuck with a decision
> that was made early in the decade, possibly without adequate
> information or consideration. While it certainly isn't a matter
> for "voting" or "collecting endorsements", I would think that
> the IETF would find statements very helpful from the ccTLD
> registries from German-speaking countries (and, ideally,
> countries with large enough German-speaking populations to have
> a lot of German-based registrations) about what they wanted to
> do and how they would deal with the incompatibility problem
> (e.g., by using "variant" techniques to be sure that a new
> registration that included Eszett did not end up in different
> hands from an existing registration that properly used the "ss"
> alternate spelling) were the change made.
> I believe that we can make some incompatible changes like this
> (and like the addition of ZWJ and ZWNJ with contextual controls)
> now if there is fairly strong consensus in the
> materially-affected communities that the change is important
> enough and that they are prepared to deal with it. I also
> think it is our last chance, so we had better get it right this
> time. Others may disagree with one or both of those beliefs.
> thanks again,
> --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
> <g.ochsner at revolistic.com> wrote:
> > Hello,
> > I am a native German speaker (born in Austria, living in
> > Germany). I noticed that there have already been postings
> > about the German sharp s (Eszett) but actually very few (if
> > any) from German people (Afaik Martin is from Switzerland,
> > where people normally do not use the sharp s).
> > I want to stress how important the sharp s actually is for
> > most of the German speaking users. Beside the 3 umlauts which
> > can already be used in IDNs the sharp s is the 4th character
> > which would really matter for users. Over 90 million German
> > speakers do use the sharp s. In German texts it is used more
> > often than the letters "j", "q" and "y" for instance. The
> > sharp s has (of course) a direct key on German keyboards.
> > Concerning IDNA I have to say, that the sharp s is NOT equal
> > to double s. Mapping the sharp s to "ss" is not natural from a
> > user's point of view. If you substitute the sharp s by "ss"
> > you will get wrong spelling in most cases and sometimes even
> > other words with totally different meanings, which can be
> > confusing. There are strict grammatical rules whether to use
> > the one or the other.
> > I am not versed enough to know the deep technical impacts, but
> > I am enthusiastic about the German language though... How
> > could the sharp s be implemented into IDNA so that it can be
> > used in IDNs? I read that the Latin capital sharp S has been
> > added to Unicode 5.1 now
> > (http://www.unicode.org/versions/Unicode5.1.0/). The document
> > also proposes a tailored casing operation from small to
> > capital sharp s where desired. What implications does that
> > have on "rule B" in the current table document and the other
> > documents?
> > As an user I would really like to see the sharp s in IDNs,
> > maybe you can discuss the technical impacts, even if it takes
> > kind of workarounds or "special" mappings...? As far as I can
> > contribute by collecting orthographic data or contacting
> > German language specialist here in Germany to join the
> > discussion, please let me know and I will try.
> > Best regards
> > Georg
> > PS.: Please forgive and correct me if I mixed up technical
> > terms...
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update