sharp s (Eszett)

Erik van der Poel erikv at google.com
Fri Mar 7 19:56:06 CET 2008


John,

Previously, I reported that MSIE7 refused to look up domain names with
U+03F7 or U+03F8 in them, and I stated my opinion that MSIE7 was doing
the right thing, because those 2 characters were unassigned in Unicode
3.2. Implementors cannot predict the case-folding relationships
between code points before they are assigned, so they should refrain
from looking up domain names with characters that they don't know
about, otherwise they might send out the wrong A-label (different from
a future implementation). Here is an example of a piece of HTML that I
tried at that time:

<a href="http://&#x3F7;.com/">

However, now I have discovered that MSIE7 even refuses to look up
domain names containing those characters when they are in A-label
form! Needless to say, my opinion of MSIE7 has now changed
drastically. This time, I tried:

(1) <a href="http://xn--nza.com/">
(2) <a href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn--mgba3a4f16a.ir/">
(3) <a href="http://xn--strae-oqa.com/">
(4) <a href="http://xm--strae-oqa.com/">

(1) has U+03F8 in it (a lower-case letter), (2) has U+200C (ZWNJ) in
it and I found it in the lower left corner of
http://www.nic.ir/List_of_Resellers and (3) has U+00DF (Eszett) in it.
None of these worked in MSIE7. You can click on them, but no DNS
packet is emitted.

On the other hand, (4) did work. This also has Eszett in it, but the
prefix has been changed to "xm--".

Since MSIE7 is so widely installed, to me, this means that we probably
have to reopen the discussion of whether we will switch to a new
prefix.

Erik

On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com> wrote:
> Georg,
>
>  I am pleased to see this position stated clearly by a native
>  German speaker.  Thanks for (again) pointing out the difference
>  between normal Swiss and German orthography and keyboards.  Your
>  terminology is, incidentally, just fine and your note is very
>  clear.
>
>  Let me see if I can quickly summarize the technical IDNA issues
>  with Eszett without taking any position on them.
>
>         (1) IDNA2003 mapped Eszett into "ss".  That is no more,
>         and no less, than a historical fact.   But it does imply
>         that giving Eszett any other treatment going forward
>         would create an incompatible change.  There are
>         certainly users today, including some in Germany, who
>         are taking advantage of the mapping, using Eszett in
>         IRIs and other references but having registered domain
>         names whose labels contain encoding of the "ss" form.
>         To paraphrase the discussion Gerv and I are having about
>         mappings, use of Eszett and the mapping obviously
>         impressed those users/ registrants as the "least bad"
>         alternative given what IDNA2003 does with the character.
>
>         (2) It is worth noting, as part of the ongoing
>         discussion about mapping (or not), that, had Eszett
>         simply been rejected by IDNA2003 (rather than mapped),
>         adding it now as a valid (and unmapped) character would
>         be a simple matter.   With the behavior in IDNA2003, any
>         change is an incompatible one.
>
>         (3) In addition to the "no upper case form", the
>         argument for making the mapping --and at least part of
>         the argument that led to the mapping in IDNA2003-- is
>         that, even though  everyone understands that some words
>         containing "ss" cannot be mapped back into Eszett,
>         "everyone" would expect the two to match. Again, that is
>         a report about how we got here historically. I am not
>         qualified to make a judgment about whether the statement
>         is actually correct.  Arguably, neither is the IETF (see
>         (5), below).
>
>         (4) There is no _technical_ problem with treating Eszett
>         as a normal letter in IDNA200X as long as everyone
>         understands that "no mapping" means "no matching with
>         the 'ss' form" and we can live with the incompatible
>         change.  You (and clearly some others) believe that is
>         the right answer for German as written in Germany (and
>         elsewhere).  Some others believe that it is the wrong
>         answer for German as written in Switzerland (and
>         elsewhere).   But there is no middle ground in which it
>         can be a character in some places and a notation for
>         "ss" in others.
>
>         (5) The incompatibility problem is a significant one,
>         since it would violate the implicit rule that a given
>         label string that is valid under both IDNA2003 and the
>         new proposals (known collectively as IDNA200X) must
>         produce the same ACE (punycode-encoded) string.
>
>  The hard problem here is how the IETF can possibly decide on
>  this.  The default decision should almost certainly be "avoid
>  incompatibility", but that would leave you stuck with a decision
>  that was made early in the decade, possibly without adequate
>  information or consideration.  While it certainly isn't a matter
>  for "voting" or "collecting endorsements", I would think that
>  the IETF would find statements very helpful from the ccTLD
>  registries from German-speaking countries (and, ideally,
>  countries with large enough German-speaking populations to have
>  a lot of German-based registrations) about what they wanted to
>  do and how they would deal with the incompatibility problem
>  (e.g., by using "variant" techniques to be sure that a new
>  registration that included Eszett did not end up in different
>  hands from an existing registration that properly used the "ss"
>  alternate spelling) were the change made.
>
>  I believe that we can make some incompatible changes like this
>  (and like the addition of ZWJ and ZWNJ with contextual controls)
>  now if there is fairly strong consensus in the
>  materially-affected communities that the change is important
>  enough and that they are prepared to deal with it.   I also
>  think it is our last chance, so we had better get it right this
>  time.   Others may disagree with one or both of those beliefs.
>
>  thanks again,
>        john
>
>
>
>
>  --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
>
>
> <g.ochsner at revolistic.com> wrote:
>
>  > Hello,
>  >
>  > I am a native German speaker (born in Austria, living in
>  > Germany). I noticed that there have already been postings
>  > about the German sharp s (Eszett) but actually very few (if
>  > any) from German people (Afaik Martin is from Switzerland,
>  > where people normally do not use the sharp s).
>  >
>  > I want to stress how important the sharp s actually is for
>  > most of the German speaking users. Beside the 3 umlauts which
>  > can already be used in IDNs the sharp s is the 4th character
>  > which would really matter for users. Over 90 million German
>  > speakers do use the sharp s. In German texts it is used more
>  > often than the letters "j", "q" and "y" for instance. The
>  > sharp s has (of course) a direct key on German keyboards.
>  >
>  > Concerning IDNA I have to say, that the sharp s is NOT equal
>  > to double s. Mapping the sharp s to "ss" is not natural from a
>  > user's point of view. If you substitute the sharp s by "ss"
>  > you will get wrong spelling in most cases and sometimes even
>  > other words with totally different meanings, which can be
>  > confusing. There are strict grammatical rules whether to use
>  > the one or the other.
>  >
>  > I am not versed enough to know the deep technical impacts, but
>  > I am enthusiastic about the German language though... How
>  > could the sharp s be implemented into IDNA so that it can be
>  > used in IDNs? I read that the Latin capital sharp S has been
>  > added to Unicode 5.1 now
>  > (http://www.unicode.org/versions/Unicode5.1.0/). The document
>  > also proposes a tailored casing operation from small to
>  > capital sharp s where desired. What implications does that
>  > have on "rule B" in the current table document and the other
>  > documents?
>  >
>  > As an user I would really like to see the sharp s in IDNs,
>  > maybe you can discuss the technical impacts, even if it takes
>  > kind of workarounds or "special" mappings...? As far as I can
>  > contribute by collecting orthographic data or contacting
>  > German language specialist here in Germany to join the
>  > discussion, please let me know and I will try.
>  >
>  > Best regards
>  > Georg
>  >
>  >
>  > PS.: Please forgive and correct me if I mixed up technical
>  > terms...
>  >
>  > _______________________________________________
>  > Idna-update mailing list
>  > Idna-update at alvestrand.no
>  > http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
>
>  _______________________________________________
>  Idna-update mailing list
>  Idna-update at alvestrand.no
>  http://www.alvestrand.no/mailman/listinfo/idna-update
>


More information about the Idna-update mailing list