sharp s (Eszett)

Vint Cerf vint at google.com
Fri Mar 7 20:07:32 CET 2008


switching to a new prefix is a very serious thing and I hope we are  
able to avoid that...


v

On Mar 7, 2008, at 1:56 PM, Erik van der Poel wrote:

> John,
>
> Previously, I reported that MSIE7 refused to look up domain names with
> U+03F7 or U+03F8 in them, and I stated my opinion that MSIE7 was doing
> the right thing, because those 2 characters were unassigned in Unicode
> 3.2. Implementors cannot predict the case-folding relationships
> between code points before they are assigned, so they should refrain
> from looking up domain names with characters that they don't know
> about, otherwise they might send out the wrong A-label (different from
> a future implementation). Here is an example of a piece of HTML that I
> tried at that time:
>
> <a href="http://&#x3F7;.com/">
>
> However, now I have discovered that MSIE7 even refuses to look up
> domain names containing those characters when they are in A-label
> form! Needless to say, my opinion of MSIE7 has now changed
> drastically. This time, I tried:
>
> (1) <a href="http://xn--nza.com/">
> (2) <a href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn-- 
> mgba3a4f16a.ir/">
> (3) <a href="http://xn--strae-oqa.com/">
> (4) <a href="http://xm--strae-oqa.com/">
>
> (1) has U+03F8 in it (a lower-case letter), (2) has U+200C (ZWNJ) in
> it and I found it in the lower left corner of
> http://www.nic.ir/List_of_Resellers and (3) has U+00DF (Eszett) in it.
> None of these worked in MSIE7. You can click on them, but no DNS
> packet is emitted.
>
> On the other hand, (4) did work. This also has Eszett in it, but the
> prefix has been changed to "xm--".
>
> Since MSIE7 is so widely installed, to me, this means that we probably
> have to reopen the discussion of whether we will switch to a new
> prefix.
>
> Erik
>
> On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com>  
> wrote:
>> Georg,
>>
>>  I am pleased to see this position stated clearly by a native
>>  German speaker.  Thanks for (again) pointing out the difference
>>  between normal Swiss and German orthography and keyboards.  Your
>>  terminology is, incidentally, just fine and your note is very
>>  clear.
>>
>>  Let me see if I can quickly summarize the technical IDNA issues
>>  with Eszett without taking any position on them.
>>
>>         (1) IDNA2003 mapped Eszett into "ss".  That is no more,
>>         and no less, than a historical fact.   But it does imply
>>         that giving Eszett any other treatment going forward
>>         would create an incompatible change.  There are
>>         certainly users today, including some in Germany, who
>>         are taking advantage of the mapping, using Eszett in
>>         IRIs and other references but having registered domain
>>         names whose labels contain encoding of the "ss" form.
>>         To paraphrase the discussion Gerv and I are having about
>>         mappings, use of Eszett and the mapping obviously
>>         impressed those users/ registrants as the "least bad"
>>         alternative given what IDNA2003 does with the character.
>>
>>         (2) It is worth noting, as part of the ongoing
>>         discussion about mapping (or not), that, had Eszett
>>         simply been rejected by IDNA2003 (rather than mapped),
>>         adding it now as a valid (and unmapped) character would
>>         be a simple matter.   With the behavior in IDNA2003, any
>>         change is an incompatible one.
>>
>>         (3) In addition to the "no upper case form", the
>>         argument for making the mapping --and at least part of
>>         the argument that led to the mapping in IDNA2003-- is
>>         that, even though  everyone understands that some words
>>         containing "ss" cannot be mapped back into Eszett,
>>         "everyone" would expect the two to match. Again, that is
>>         a report about how we got here historically. I am not
>>         qualified to make a judgment about whether the statement
>>         is actually correct.  Arguably, neither is the IETF (see
>>         (5), below).
>>
>>         (4) There is no _technical_ problem with treating Eszett
>>         as a normal letter in IDNA200X as long as everyone
>>         understands that "no mapping" means "no matching with
>>         the 'ss' form" and we can live with the incompatible
>>         change.  You (and clearly some others) believe that is
>>         the right answer for German as written in Germany (and
>>         elsewhere).  Some others believe that it is the wrong
>>         answer for German as written in Switzerland (and
>>         elsewhere).   But there is no middle ground in which it
>>         can be a character in some places and a notation for
>>         "ss" in others.
>>
>>         (5) The incompatibility problem is a significant one,
>>         since it would violate the implicit rule that a given
>>         label string that is valid under both IDNA2003 and the
>>         new proposals (known collectively as IDNA200X) must
>>         produce the same ACE (punycode-encoded) string.
>>
>>  The hard problem here is how the IETF can possibly decide on
>>  this.  The default decision should almost certainly be "avoid
>>  incompatibility", but that would leave you stuck with a decision
>>  that was made early in the decade, possibly without adequate
>>  information or consideration.  While it certainly isn't a matter
>>  for "voting" or "collecting endorsements", I would think that
>>  the IETF would find statements very helpful from the ccTLD
>>  registries from German-speaking countries (and, ideally,
>>  countries with large enough German-speaking populations to have
>>  a lot of German-based registrations) about what they wanted to
>>  do and how they would deal with the incompatibility problem
>>  (e.g., by using "variant" techniques to be sure that a new
>>  registration that included Eszett did not end up in different
>>  hands from an existing registration that properly used the "ss"
>>  alternate spelling) were the change made.
>>
>>  I believe that we can make some incompatible changes like this
>>  (and like the addition of ZWJ and ZWNJ with contextual controls)
>>  now if there is fairly strong consensus in the
>>  materially-affected communities that the change is important
>>  enough and that they are prepared to deal with it.   I also
>>  think it is our last chance, so we had better get it right this
>>  time.   Others may disagree with one or both of those beliefs.
>>
>>  thanks again,
>>        john
>>
>>
>>
>>
>>  --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
>>
>>
>> <g.ochsner at revolistic.com> wrote:
>>
>>> Hello,
>>>
>>> I am a native German speaker (born in Austria, living in
>>> Germany). I noticed that there have already been postings
>>> about the German sharp s (Eszett) but actually very few (if
>>> any) from German people (Afaik Martin is from Switzerland,
>>> where people normally do not use the sharp s).
>>>
>>> I want to stress how important the sharp s actually is for
>>> most of the German speaking users. Beside the 3 umlauts which
>>> can already be used in IDNs the sharp s is the 4th character
>>> which would really matter for users. Over 90 million German
>>> speakers do use the sharp s. In German texts it is used more
>>> often than the letters "j", "q" and "y" for instance. The
>>> sharp s has (of course) a direct key on German keyboards.
>>>
>>> Concerning IDNA I have to say, that the sharp s is NOT equal
>>> to double s. Mapping the sharp s to "ss" is not natural from a
>>> user's point of view. If you substitute the sharp s by "ss"
>>> you will get wrong spelling in most cases and sometimes even
>>> other words with totally different meanings, which can be
>>> confusing. There are strict grammatical rules whether to use
>>> the one or the other.
>>>
>>> I am not versed enough to know the deep technical impacts, but
>>> I am enthusiastic about the German language though... How
>>> could the sharp s be implemented into IDNA so that it can be
>>> used in IDNs? I read that the Latin capital sharp S has been
>>> added to Unicode 5.1 now
>>> (http://www.unicode.org/versions/Unicode5.1.0/). The document
>>> also proposes a tailored casing operation from small to
>>> capital sharp s where desired. What implications does that
>>> have on "rule B" in the current table document and the other
>>> documents?
>>>
>>> As an user I would really like to see the sharp s in IDNs,
>>> maybe you can discuss the technical impacts, even if it takes
>>> kind of workarounds or "special" mappings...? As far as I can
>>> contribute by collecting orthographic data or contacting
>>> German language specialist here in Germany to join the
>>> discussion, please let me know and I will try.
>>>
>>> Best regards
>>> Georg
>>>
>>>
>>> PS.: Please forgive and correct me if I mixed up technical
>>> terms...
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>>
>>
>>  _______________________________________________
>>  Idna-update mailing list
>>  Idna-update at alvestrand.no
>>  http://www.alvestrand.no/mailman/listinfo/idna-update
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list