sharp s (Eszett)

Mark Davis mark.davis at icu-project.org
Sat Mar 8 00:01:06 CET 2008


The main reason for mapping ß to "ss" in IDNA2003 is for case insensitivity.
That is, so that "Ruß.com" matches "RUSS.COM", which in turn matches "
russ.com". A small side benefit is that some words that are spelled
differently by Swiss, or differently pre/post Spelling Reform also have the
same internal form, but that side benefit was really given no weight in the
original decision. Remember, also, that a great many words of every language
cannot be simply used as is with IDNA. One can't use many extremely common
English words; for example, "can't.com" won't work.

At this point, I really doubt that that the advantage of having ß outweighs
the cost of either incompatibility with IDNA2003, or the morass that would
be caused by a prefix change.

Mark

On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com> wrote:

> Georg,
>
> I am pleased to see this position stated clearly by a native
> German speaker.  Thanks for (again) pointing out the difference
> between normal Swiss and German orthography and keyboards.  Your
> terminology is, incidentally, just fine and your note is very
> clear.
>
> Let me see if I can quickly summarize the technical IDNA issues
> with Eszett without taking any position on them.
>
>        (1) IDNA2003 mapped Eszett into "ss".  That is no more,
>        and no less, than a historical fact.   But it does imply
>        that giving Eszett any other treatment going forward
>        would create an incompatible change.  There are
>        certainly users today, including some in Germany, who
>        are taking advantage of the mapping, using Eszett in
>        IRIs and other references but having registered domain
>        names whose labels contain encoding of the "ss" form.
>        To paraphrase the discussion Gerv and I are having about
>        mappings, use of Eszett and the mapping obviously
>        impressed those users/ registrants as the "least bad"
>        alternative given what IDNA2003 does with the character.
>
>        (2) It is worth noting, as part of the ongoing
>        discussion about mapping (or not), that, had Eszett
>        simply been rejected by IDNA2003 (rather than mapped),
>        adding it now as a valid (and unmapped) character would
>        be a simple matter.   With the behavior in IDNA2003, any
>        change is an incompatible one.
>
>        (3) In addition to the "no upper case form", the
>        argument for making the mapping --and at least part of
>        the argument that led to the mapping in IDNA2003-- is
>        that, even though  everyone understands that some words
>        containing "ss" cannot be mapped back into Eszett,
>        "everyone" would expect the two to match. Again, that is
>        a report about how we got here historically. I am not
>        qualified to make a judgment about whether the statement
>        is actually correct.  Arguably, neither is the IETF (see
>        (5), below).
>
>        (4) There is no _technical_ problem with treating Eszett
>        as a normal letter in IDNA200X as long as everyone
>        understands that "no mapping" means "no matching with
>        the 'ss' form" and we can live with the incompatible
>        change.  You (and clearly some others) believe that is
>        the right answer for German as written in Germany (and
>        elsewhere).  Some others believe that it is the wrong
>        answer for German as written in Switzerland (and
>        elsewhere).   But there is no middle ground in which it
>        can be a character in some places and a notation for
>        "ss" in others.
>
>        (5) The incompatibility problem is a significant one,
>        since it would violate the implicit rule that a given
>        label string that is valid under both IDNA2003 and the
>        new proposals (known collectively as IDNA200X) must
>        produce the same ACE (punycode-encoded) string.
>
> The hard problem here is how the IETF can possibly decide on
> this.  The default decision should almost certainly be "avoid
> incompatibility", but that would leave you stuck with a decision
> that was made early in the decade, possibly without adequate
> information or consideration.  While it certainly isn't a matter
> for "voting" or "collecting endorsements", I would think that
> the IETF would find statements very helpful from the ccTLD
> registries from German-speaking countries (and, ideally,
> countries with large enough German-speaking populations to have
> a lot of German-based registrations) about what they wanted to
> do and how they would deal with the incompatibility problem
> (e.g., by using "variant" techniques to be sure that a new
> registration that included Eszett did not end up in different
> hands from an existing registration that properly used the "ss"
> alternate spelling) were the change made.
>
> I believe that we can make some incompatible changes like this
> (and like the addition of ZWJ and ZWNJ with contextual controls)
> now if there is fairly strong consensus in the
> materially-affected communities that the change is important
> enough and that they are prepared to deal with it.   I also
> think it is our last chance, so we had better get it right this
> time.   Others may disagree with one or both of those beliefs.
>
> thanks again,
>       john
>
>
>
>
> --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
> <g.ochsner at revolistic.com> wrote:
>
> > Hello,
> >
> > I am a native German speaker (born in Austria, living in
> > Germany). I noticed that there have already been postings
> > about the German sharp s (Eszett) but actually very few (if
> > any) from German people (Afaik Martin is from Switzerland,
> > where people normally do not use the sharp s).
> >
> > I want to stress how important the sharp s actually is for
> > most of the German speaking users. Beside the 3 umlauts which
> > can already be used in IDNs the sharp s is the 4th character
> > which would really matter for users. Over 90 million German
> > speakers do use the sharp s. In German texts it is used more
> > often than the letters "j", "q" and "y" for instance. The
> > sharp s has (of course) a direct key on German keyboards.
> >
> > Concerning IDNA I have to say, that the sharp s is NOT equal
> > to double s. Mapping the sharp s to "ss" is not natural from a
> > user's point of view. If you substitute the sharp s by "ss"
> > you will get wrong spelling in most cases and sometimes even
> > other words with totally different meanings, which can be
> > confusing. There are strict grammatical rules whether to use
> > the one or the other.
> >
> > I am not versed enough to know the deep technical impacts, but
> > I am enthusiastic about the German language though... How
> > could the sharp s be implemented into IDNA so that it can be
> > used in IDNs? I read that the Latin capital sharp S has been
> > added to Unicode 5.1 now
> > (http://www.unicode.org/versions/Unicode5.1.0/). The document
> > also proposes a tailored casing operation from small to
> > capital sharp s where desired. What implications does that
> > have on "rule B" in the current table document and the other
> > documents?
> >
> > As an user I would really like to see the sharp s in IDNs,
> > maybe you can discuss the technical impacts, even if it takes
> > kind of workarounds or "special" mappings...? As far as I can
> > contribute by collecting orthographic data or contacting
> > German language specialist here in Germany to join the
> > discussion, please let me know and I will try.
> >
> > Best regards
> > Georg
> >
> >
> > PS.: Please forgive and correct me if I mixed up technical
> > terms...
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080307/3ec14988/attachment-0001.html


More information about the Idna-update mailing list