sharp s (Eszett)
michelsu at windows.microsoft.com
Fri Mar 7 20:51:29 CET 2008
Again, when I read http://tools.ietf.org/html/draft-klensin-idnabis-protocol-04#section-5.4 I see a set of rules that will require application to update when the repertoire grows. I am not exactly excited about it either, but it just means that browsers need some sort of self-update mechanism that most of them have anyway (including IE7). It is fairly clear that every browser (afaik) will require a serious patch to move from IDNA2003 to IDNA200x anyway (the bidi behavior change being another breaking change), so this does not create a specific issue on its own.
From: Erik van der Poel [mailto:erikv at google.com]
Sent: Friday, March 07, 2008 11:24 AM
To: Michel Suignard
Cc: idna-update at alvestrand.no; Georg Ochsner
Subject: Re: sharp s (Eszett)
Should we all wait until MSIE7 market share has dwindled close to zero
before we start using U+200C and Unicodes after 3.2 on the Web?
Or will Microsoft patch MSIE7, perhaps in an automatic update?
On Fri, Mar 7, 2008 at 11:14 AM, Michel Suignard
<michelsu at windows.microsoft.com> wrote:
> Erik, I understand that not looking up unassigned code points isn't exactly IDNA2003 compliant, but isn't it the direction that IDNA200x is taking anyway, so why would this require a prefix change? It is my understanding (I could be wrong, I am not part of the IE team, I was only in charge of providing the IDN library) that A-labels are validated anyway, so there is no difference between U and A label from that perspective.
> I don't see anything in the behavior described below that would justify such a drastic move as a prefix change.
> -----Original Message-----
> From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Erik van der Poel
> Sent: Friday, March 07, 2008 10:56 AM
> To: John C Klensin
> Cc: idna-update at alvestrand.no; Georg Ochsner
> Subject: Re: sharp s (Eszett)
> Previously, I reported that MSIE7 refused to look up domain names with
> U+03F7 or U+03F8 in them, and I stated my opinion that MSIE7 was doing
> the right thing, because those 2 characters were unassigned in Unicode
> 3.2. Implementors cannot predict the case-folding relationships
> between code points before they are assigned, so they should refrain
> from looking up domain names with characters that they don't know
> about, otherwise they might send out the wrong A-label (different from
> a future implementation). Here is an example of a piece of HTML that I
> tried at that time:
> <a href="http://Ϸ.com/">
> However, now I have discovered that MSIE7 even refuses to look up
> domain names containing those characters when they are in A-label
> form! Needless to say, my opinion of MSIE7 has now changed
> drastically. This time, I tried:
> (1) <a href="http://xn--nza.com/">
> (2) <a href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn--mgba3a4f16a.ir/">
> (3) <a href="http://xn--strae-oqa.com/">
> (4) <a href="http://xm--strae-oqa.com/">
> (1) has U+03F8 in it (a lower-case letter), (2) has U+200C (ZWNJ) in
> it and I found it in the lower left corner of
> http://www.nic.ir/List_of_Resellers and (3) has U+00DF (Eszett) in it.
> None of these worked in MSIE7. You can click on them, but no DNS
> packet is emitted.
> On the other hand, (4) did work. This also has Eszett in it, but the
> prefix has been changed to "xm--".
> Since MSIE7 is so widely installed, to me, this means that we probably
> have to reopen the discussion of whether we will switch to a new
> On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com> wrote:
> > Georg,
> > I am pleased to see this position stated clearly by a native
> > German speaker. Thanks for (again) pointing out the difference
> > between normal Swiss and German orthography and keyboards. Your
> > terminology is, incidentally, just fine and your note is very
> > clear.
> > Let me see if I can quickly summarize the technical IDNA issues
> > with Eszett without taking any position on them.
> > (1) IDNA2003 mapped Eszett into "ss". That is no more,
> > and no less, than a historical fact. But it does imply
> > that giving Eszett any other treatment going forward
> > would create an incompatible change. There are
> > certainly users today, including some in Germany, who
> > are taking advantage of the mapping, using Eszett in
> > IRIs and other references but having registered domain
> > names whose labels contain encoding of the "ss" form.
> > To paraphrase the discussion Gerv and I are having about
> > mappings, use of Eszett and the mapping obviously
> > impressed those users/ registrants as the "least bad"
> > alternative given what IDNA2003 does with the character.
> > (2) It is worth noting, as part of the ongoing
> > discussion about mapping (or not), that, had Eszett
> > simply been rejected by IDNA2003 (rather than mapped),
> > adding it now as a valid (and unmapped) character would
> > be a simple matter. With the behavior in IDNA2003, any
> > change is an incompatible one.
> > (3) In addition to the "no upper case form", the
> > argument for making the mapping --and at least part of
> > the argument that led to the mapping in IDNA2003-- is
> > that, even though everyone understands that some words
> > containing "ss" cannot be mapped back into Eszett,
> > "everyone" would expect the two to match. Again, that is
> > a report about how we got here historically. I am not
> > qualified to make a judgment about whether the statement
> > is actually correct. Arguably, neither is the IETF (see
> > (5), below).
> > (4) There is no _technical_ problem with treating Eszett
> > as a normal letter in IDNA200X as long as everyone
> > understands that "no mapping" means "no matching with
> > the 'ss' form" and we can live with the incompatible
> > change. You (and clearly some others) believe that is
> > the right answer for German as written in Germany (and
> > elsewhere). Some others believe that it is the wrong
> > answer for German as written in Switzerland (and
> > elsewhere). But there is no middle ground in which it
> > can be a character in some places and a notation for
> > "ss" in others.
> > (5) The incompatibility problem is a significant one,
> > since it would violate the implicit rule that a given
> > label string that is valid under both IDNA2003 and the
> > new proposals (known collectively as IDNA200X) must
> > produce the same ACE (punycode-encoded) string.
> > The hard problem here is how the IETF can possibly decide on
> > this. The default decision should almost certainly be "avoid
> > incompatibility", but that would leave you stuck with a decision
> > that was made early in the decade, possibly without adequate
> > information or consideration. While it certainly isn't a matter
> > for "voting" or "collecting endorsements", I would think that
> > the IETF would find statements very helpful from the ccTLD
> > registries from German-speaking countries (and, ideally,
> > countries with large enough German-speaking populations to have
> > a lot of German-based registrations) about what they wanted to
> > do and how they would deal with the incompatibility problem
> > (e.g., by using "variant" techniques to be sure that a new
> > registration that included Eszett did not end up in different
> > hands from an existing registration that properly used the "ss"
> > alternate spelling) were the change made.
> > I believe that we can make some incompatible changes like this
> > (and like the addition of ZWJ and ZWNJ with contextual controls)
> > now if there is fairly strong consensus in the
> > materially-affected communities that the change is important
> > enough and that they are prepared to deal with it. I also
> > think it is our last chance, so we had better get it right this
> > time. Others may disagree with one or both of those beliefs.
> > thanks again,
> > john
> > --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
> > <g.ochsner at revolistic.com> wrote:
> > > Hello,
> > >
> > > I am a native German speaker (born in Austria, living in
> > > Germany). I noticed that there have already been postings
> > > about the German sharp s (Eszett) but actually very few (if
> > > any) from German people (Afaik Martin is from Switzerland,
> > > where people normally do not use the sharp s).
> > >
> > > I want to stress how important the sharp s actually is for
> > > most of the German speaking users. Beside the 3 umlauts which
> > > can already be used in IDNs the sharp s is the 4th character
> > > which would really matter for users. Over 90 million German
> > > speakers do use the sharp s. In German texts it is used more
> > > often than the letters "j", "q" and "y" for instance. The
> > > sharp s has (of course) a direct key on German keyboards.
> > >
> > > Concerning IDNA I have to say, that the sharp s is NOT equal
> > > to double s. Mapping the sharp s to "ss" is not natural from a
> > > user's point of view. If you substitute the sharp s by "ss"
> > > you will get wrong spelling in most cases and sometimes even
> > > other words with totally different meanings, which can be
> > > confusing. There are strict grammatical rules whether to use
> > > the one or the other.
> > >
> > > I am not versed enough to know the deep technical impacts, but
> > > I am enthusiastic about the German language though... How
> > > could the sharp s be implemented into IDNA so that it can be
> > > used in IDNs? I read that the Latin capital sharp S has been
> > > added to Unicode 5.1 now
> > > (http://www.unicode.org/versions/Unicode5.1.0/). The document
> > > also proposes a tailored casing operation from small to
> > > capital sharp s where desired. What implications does that
> > > have on "rule B" in the current table document and the other
> > > documents?
> > >
> > > As an user I would really like to see the sharp s in IDNs,
> > > maybe you can discuss the technical impacts, even if it takes
> > > kind of workarounds or "special" mappings...? As far as I can
> > > contribute by collecting orthographic data or contacting
> > > German language specialist here in Germany to join the
> > > discussion, please let me know and I will try.
> > >
> > > Best regards
> > > Georg
> > >
> > >
> > > PS.: Please forgive and correct me if I mixed up technical
> > > terms...
> > >
> > > _______________________________________________
> > > Idna-update mailing list
> > > Idna-update at alvestrand.no
> > > http://www.alvestrand.no/mailman/listinfo/idna-update
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update