sharp s (Eszett)

Michel Suignard michelsu at windows.microsoft.com
Fri Mar 7 20:51:29 CET 2008


Again, when I read http://tools.ietf.org/html/draft-klensin-idnabis-protocol-04#section-5.4 I see a set of rules that will require application to update when the repertoire grows. I am not exactly excited about it either, but it just means that browsers need some sort of self-update mechanism that most of them have anyway (including IE7). It is fairly clear that every browser (afaik) will require a serious patch to move from IDNA2003 to IDNA200x anyway (the bidi behavior change being another breaking change), so this does not create a specific issue on its own.

Michel

-----Original Message-----
From: Erik van der Poel [mailto:erikv at google.com]
Sent: Friday, March 07, 2008 11:24 AM
To: Michel Suignard
Cc: idna-update at alvestrand.no; Georg Ochsner
Subject: Re: sharp s (Eszett)

Should we all wait until MSIE7 market share has dwindled close to zero
before we start using U+200C and Unicodes after 3.2 on the Web?

Or will Microsoft patch MSIE7, perhaps in an automatic update?

Erik

On Fri, Mar 7, 2008 at 11:14 AM, Michel Suignard
<michelsu at windows.microsoft.com> wrote:
> Erik, I understand that not looking up unassigned code points isn't exactly IDNA2003 compliant, but isn't it the direction that IDNA200x is taking anyway, so why would this require a prefix change? It is my understanding (I could be wrong, I am not part of the IE team, I was only in charge of providing the IDN library) that A-labels are validated anyway, so there is no difference between U and A label from that perspective.
>
>  I don't see anything in the behavior described below that would justify such a drastic move as a prefix change.
>
>  Michel
>
>
>
>  -----Original Message-----
>  From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Erik van der Poel
>  Sent: Friday, March 07, 2008 10:56 AM
>  To: John C Klensin
>  Cc: idna-update at alvestrand.no; Georg Ochsner
>  Subject: Re: sharp s (Eszett)
>
>  John,
>
>  Previously, I reported that MSIE7 refused to look up domain names with
>  U+03F7 or U+03F8 in them, and I stated my opinion that MSIE7 was doing
>  the right thing, because those 2 characters were unassigned in Unicode
>  3.2. Implementors cannot predict the case-folding relationships
>  between code points before they are assigned, so they should refrain
>  from looking up domain names with characters that they don't know
>  about, otherwise they might send out the wrong A-label (different from
>  a future implementation). Here is an example of a piece of HTML that I
>  tried at that time:
>
>  <a href="http://&#x3F7;.com/">
>
>  However, now I have discovered that MSIE7 even refuses to look up
>  domain names containing those characters when they are in A-label
>  form! Needless to say, my opinion of MSIE7 has now changed
>  drastically. This time, I tried:
>
>  (1) <a href="http://xn--nza.com/">
>  (2) <a href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn--mgba3a4f16a.ir/">
>  (3) <a href="http://xn--strae-oqa.com/">
>  (4) <a href="http://xm--strae-oqa.com/">
>
>  (1) has U+03F8 in it (a lower-case letter), (2) has U+200C (ZWNJ) in
>  it and I found it in the lower left corner of
>  http://www.nic.ir/List_of_Resellers and (3) has U+00DF (Eszett) in it.
>  None of these worked in MSIE7. You can click on them, but no DNS
>  packet is emitted.
>
>  On the other hand, (4) did work. This also has Eszett in it, but the
>  prefix has been changed to "xm--".
>
>  Since MSIE7 is so widely installed, to me, this means that we probably
>  have to reopen the discussion of whether we will switch to a new
>  prefix.
>
>  Erik
>
>  On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com> wrote:
>  > Georg,
>  >
>  >  I am pleased to see this position stated clearly by a native
>  >  German speaker.  Thanks for (again) pointing out the difference
>  >  between normal Swiss and German orthography and keyboards.  Your
>  >  terminology is, incidentally, just fine and your note is very
>  >  clear.
>  >
>  >  Let me see if I can quickly summarize the technical IDNA issues
>  >  with Eszett without taking any position on them.
>  >
>  >         (1) IDNA2003 mapped Eszett into "ss".  That is no more,
>  >         and no less, than a historical fact.   But it does imply
>  >         that giving Eszett any other treatment going forward
>  >         would create an incompatible change.  There are
>  >         certainly users today, including some in Germany, who
>  >         are taking advantage of the mapping, using Eszett in
>  >         IRIs and other references but having registered domain
>  >         names whose labels contain encoding of the "ss" form.
>  >         To paraphrase the discussion Gerv and I are having about
>  >         mappings, use of Eszett and the mapping obviously
>  >         impressed those users/ registrants as the "least bad"
>  >         alternative given what IDNA2003 does with the character.
>  >
>  >         (2) It is worth noting, as part of the ongoing
>  >         discussion about mapping (or not), that, had Eszett
>  >         simply been rejected by IDNA2003 (rather than mapped),
>  >         adding it now as a valid (and unmapped) character would
>  >         be a simple matter.   With the behavior in IDNA2003, any
>  >         change is an incompatible one.
>  >
>  >         (3) In addition to the "no upper case form", the
>  >         argument for making the mapping --and at least part of
>  >         the argument that led to the mapping in IDNA2003-- is
>  >         that, even though  everyone understands that some words
>  >         containing "ss" cannot be mapped back into Eszett,
>  >         "everyone" would expect the two to match. Again, that is
>  >         a report about how we got here historically. I am not
>  >         qualified to make a judgment about whether the statement
>  >         is actually correct.  Arguably, neither is the IETF (see
>  >         (5), below).
>  >
>  >         (4) There is no _technical_ problem with treating Eszett
>  >         as a normal letter in IDNA200X as long as everyone
>  >         understands that "no mapping" means "no matching with
>  >         the 'ss' form" and we can live with the incompatible
>  >         change.  You (and clearly some others) believe that is
>  >         the right answer for German as written in Germany (and
>  >         elsewhere).  Some others believe that it is the wrong
>  >         answer for German as written in Switzerland (and
>  >         elsewhere).   But there is no middle ground in which it
>  >         can be a character in some places and a notation for
>  >         "ss" in others.
>  >
>  >         (5) The incompatibility problem is a significant one,
>  >         since it would violate the implicit rule that a given
>  >         label string that is valid under both IDNA2003 and the
>  >         new proposals (known collectively as IDNA200X) must
>  >         produce the same ACE (punycode-encoded) string.
>  >
>  >  The hard problem here is how the IETF can possibly decide on
>  >  this.  The default decision should almost certainly be "avoid
>  >  incompatibility", but that would leave you stuck with a decision
>  >  that was made early in the decade, possibly without adequate
>  >  information or consideration.  While it certainly isn't a matter
>  >  for "voting" or "collecting endorsements", I would think that
>  >  the IETF would find statements very helpful from the ccTLD
>  >  registries from German-speaking countries (and, ideally,
>  >  countries with large enough German-speaking populations to have
>  >  a lot of German-based registrations) about what they wanted to
>  >  do and how they would deal with the incompatibility problem
>  >  (e.g., by using "variant" techniques to be sure that a new
>  >  registration that included Eszett did not end up in different
>  >  hands from an existing registration that properly used the "ss"
>  >  alternate spelling) were the change made.
>  >
>  >  I believe that we can make some incompatible changes like this
>  >  (and like the addition of ZWJ and ZWNJ with contextual controls)
>  >  now if there is fairly strong consensus in the
>  >  materially-affected communities that the change is important
>  >  enough and that they are prepared to deal with it.   I also
>  >  think it is our last chance, so we had better get it right this
>  >  time.   Others may disagree with one or both of those beliefs.
>  >
>  >  thanks again,
>  >        john
>  >
>  >
>  >
>  >
>  >  --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
>  >
>  >
>  > <g.ochsner at revolistic.com> wrote:
>  >
>  >  > Hello,
>  >  >
>  >  > I am a native German speaker (born in Austria, living in
>  >  > Germany). I noticed that there have already been postings
>  >  > about the German sharp s (Eszett) but actually very few (if
>  >  > any) from German people (Afaik Martin is from Switzerland,
>  >  > where people normally do not use the sharp s).
>  >  >
>  >  > I want to stress how important the sharp s actually is for
>  >  > most of the German speaking users. Beside the 3 umlauts which
>  >  > can already be used in IDNs the sharp s is the 4th character
>  >  > which would really matter for users. Over 90 million German
>  >  > speakers do use the sharp s. In German texts it is used more
>  >  > often than the letters "j", "q" and "y" for instance. The
>  >  > sharp s has (of course) a direct key on German keyboards.
>  >  >
>  >  > Concerning IDNA I have to say, that the sharp s is NOT equal
>  >  > to double s. Mapping the sharp s to "ss" is not natural from a
>  >  > user's point of view. If you substitute the sharp s by "ss"
>  >  > you will get wrong spelling in most cases and sometimes even
>  >  > other words with totally different meanings, which can be
>  >  > confusing. There are strict grammatical rules whether to use
>  >  > the one or the other.
>  >  >
>  >  > I am not versed enough to know the deep technical impacts, but
>  >  > I am enthusiastic about the German language though... How
>  >  > could the sharp s be implemented into IDNA so that it can be
>  >  > used in IDNs? I read that the Latin capital sharp S has been
>  >  > added to Unicode 5.1 now
>  >  > (http://www.unicode.org/versions/Unicode5.1.0/). The document
>  >  > also proposes a tailored casing operation from small to
>  >  > capital sharp s where desired. What implications does that
>  >  > have on "rule B" in the current table document and the other
>  >  > documents?
>  >  >
>  >  > As an user I would really like to see the sharp s in IDNs,
>  >  > maybe you can discuss the technical impacts, even if it takes
>  >  > kind of workarounds or "special" mappings...? As far as I can
>  >  > contribute by collecting orthographic data or contacting
>  >  > German language specialist here in Germany to join the
>  >  > discussion, please let me know and I will try.
>  >  >
>  >  > Best regards
>  >  > Georg
>  >  >
>  >  >
>  >  > PS.: Please forgive and correct me if I mixed up technical
>  >  > terms...
>  >  >
>  >  > _______________________________________________
>  >  > Idna-update mailing list
>  >  > Idna-update at alvestrand.no
>  >  > http://www.alvestrand.no/mailman/listinfo/idna-update
>  >
>  >
>  >
>  >
>  >  _______________________________________________
>  >  Idna-update mailing list
>  >  Idna-update at alvestrand.no
>  >  http://www.alvestrand.no/mailman/listinfo/idna-update
>  >
>  _______________________________________________
>  Idna-update mailing list
>  Idna-update at alvestrand.no
>  http://www.alvestrand.no/mailman/listinfo/idna-update
>



More information about the Idna-update mailing list