sharp s (Eszett)
Vint Cerf
vint at google.com
Fri Mar 7 20:07:32 CET 2008
switching to a new prefix is a very serious thing and I hope we are
able to avoid that...
v
On Mar 7, 2008, at 1:56 PM, Erik van der Poel wrote:
> John,
>
> Previously, I reported that MSIE7 refused to look up domain names with
> U+03F7 or U+03F8 in them, and I stated my opinion that MSIE7 was doing
> the right thing, because those 2 characters were unassigned in Unicode
> 3.2. Implementors cannot predict the case-folding relationships
> between code points before they are assigned, so they should refrain
> from looking up domain names with characters that they don't know
> about, otherwise they might send out the wrong A-label (different from
> a future implementation). Here is an example of a piece of HTML that I
> tried at that time:
>
> <a href="http://Ϸ.com/">
>
> However, now I have discovered that MSIE7 even refuses to look up
> domain names containing those characters when they are in A-label
> form! Needless to say, my opinion of MSIE7 has now changed
> drastically. This time, I tried:
>
> (1) <a href="http://xn--nza.com/">
> (2) <a href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn--
> mgba3a4f16a.ir/">
> (3) <a href="http://xn--strae-oqa.com/">
> (4) <a href="http://xm--strae-oqa.com/">
>
> (1) has U+03F8 in it (a lower-case letter), (2) has U+200C (ZWNJ) in
> it and I found it in the lower left corner of
> http://www.nic.ir/List_of_Resellers and (3) has U+00DF (Eszett) in it.
> None of these worked in MSIE7. You can click on them, but no DNS
> packet is emitted.
>
> On the other hand, (4) did work. This also has Eszett in it, but the
> prefix has been changed to "xm--".
>
> Since MSIE7 is so widely installed, to me, this means that we probably
> have to reopen the discussion of whether we will switch to a new
> prefix.
>
> Erik
>
> On Fri, Mar 7, 2008 at 9:41 AM, John C Klensin <klensin at jck.com>
> wrote:
>> Georg,
>>
>> I am pleased to see this position stated clearly by a native
>> German speaker. Thanks for (again) pointing out the difference
>> between normal Swiss and German orthography and keyboards. Your
>> terminology is, incidentally, just fine and your note is very
>> clear.
>>
>> Let me see if I can quickly summarize the technical IDNA issues
>> with Eszett without taking any position on them.
>>
>> (1) IDNA2003 mapped Eszett into "ss". That is no more,
>> and no less, than a historical fact. But it does imply
>> that giving Eszett any other treatment going forward
>> would create an incompatible change. There are
>> certainly users today, including some in Germany, who
>> are taking advantage of the mapping, using Eszett in
>> IRIs and other references but having registered domain
>> names whose labels contain encoding of the "ss" form.
>> To paraphrase the discussion Gerv and I are having about
>> mappings, use of Eszett and the mapping obviously
>> impressed those users/ registrants as the "least bad"
>> alternative given what IDNA2003 does with the character.
>>
>> (2) It is worth noting, as part of the ongoing
>> discussion about mapping (or not), that, had Eszett
>> simply been rejected by IDNA2003 (rather than mapped),
>> adding it now as a valid (and unmapped) character would
>> be a simple matter. With the behavior in IDNA2003, any
>> change is an incompatible one.
>>
>> (3) In addition to the "no upper case form", the
>> argument for making the mapping --and at least part of
>> the argument that led to the mapping in IDNA2003-- is
>> that, even though everyone understands that some words
>> containing "ss" cannot be mapped back into Eszett,
>> "everyone" would expect the two to match. Again, that is
>> a report about how we got here historically. I am not
>> qualified to make a judgment about whether the statement
>> is actually correct. Arguably, neither is the IETF (see
>> (5), below).
>>
>> (4) There is no _technical_ problem with treating Eszett
>> as a normal letter in IDNA200X as long as everyone
>> understands that "no mapping" means "no matching with
>> the 'ss' form" and we can live with the incompatible
>> change. You (and clearly some others) believe that is
>> the right answer for German as written in Germany (and
>> elsewhere). Some others believe that it is the wrong
>> answer for German as written in Switzerland (and
>> elsewhere). But there is no middle ground in which it
>> can be a character in some places and a notation for
>> "ss" in others.
>>
>> (5) The incompatibility problem is a significant one,
>> since it would violate the implicit rule that a given
>> label string that is valid under both IDNA2003 and the
>> new proposals (known collectively as IDNA200X) must
>> produce the same ACE (punycode-encoded) string.
>>
>> The hard problem here is how the IETF can possibly decide on
>> this. The default decision should almost certainly be "avoid
>> incompatibility", but that would leave you stuck with a decision
>> that was made early in the decade, possibly without adequate
>> information or consideration. While it certainly isn't a matter
>> for "voting" or "collecting endorsements", I would think that
>> the IETF would find statements very helpful from the ccTLD
>> registries from German-speaking countries (and, ideally,
>> countries with large enough German-speaking populations to have
>> a lot of German-based registrations) about what they wanted to
>> do and how they would deal with the incompatibility problem
>> (e.g., by using "variant" techniques to be sure that a new
>> registration that included Eszett did not end up in different
>> hands from an existing registration that properly used the "ss"
>> alternate spelling) were the change made.
>>
>> I believe that we can make some incompatible changes like this
>> (and like the addition of ZWJ and ZWNJ with contextual controls)
>> now if there is fairly strong consensus in the
>> materially-affected communities that the change is important
>> enough and that they are prepared to deal with it. I also
>> think it is our last chance, so we had better get it right this
>> time. Others may disagree with one or both of those beliefs.
>>
>> thanks again,
>> john
>>
>>
>>
>>
>> --On Friday, 07 March, 2008 17:52 +0100 Georg Ochsner
>>
>>
>> <g.ochsner at revolistic.com> wrote:
>>
>>> Hello,
>>>
>>> I am a native German speaker (born in Austria, living in
>>> Germany). I noticed that there have already been postings
>>> about the German sharp s (Eszett) but actually very few (if
>>> any) from German people (Afaik Martin is from Switzerland,
>>> where people normally do not use the sharp s).
>>>
>>> I want to stress how important the sharp s actually is for
>>> most of the German speaking users. Beside the 3 umlauts which
>>> can already be used in IDNs the sharp s is the 4th character
>>> which would really matter for users. Over 90 million German
>>> speakers do use the sharp s. In German texts it is used more
>>> often than the letters "j", "q" and "y" for instance. The
>>> sharp s has (of course) a direct key on German keyboards.
>>>
>>> Concerning IDNA I have to say, that the sharp s is NOT equal
>>> to double s. Mapping the sharp s to "ss" is not natural from a
>>> user's point of view. If you substitute the sharp s by "ss"
>>> you will get wrong spelling in most cases and sometimes even
>>> other words with totally different meanings, which can be
>>> confusing. There are strict grammatical rules whether to use
>>> the one or the other.
>>>
>>> I am not versed enough to know the deep technical impacts, but
>>> I am enthusiastic about the German language though... How
>>> could the sharp s be implemented into IDNA so that it can be
>>> used in IDNs? I read that the Latin capital sharp S has been
>>> added to Unicode 5.1 now
>>> (http://www.unicode.org/versions/Unicode5.1.0/). The document
>>> also proposes a tailored casing operation from small to
>>> capital sharp s where desired. What implications does that
>>> have on "rule B" in the current table document and the other
>>> documents?
>>>
>>> As an user I would really like to see the sharp s in IDNs,
>>> maybe you can discuss the technical impacts, even if it takes
>>> kind of workarounds or "special" mappings...? As far as I can
>>> contribute by collecting orthographic data or contacting
>>> German language specialist here in Germany to join the
>>> discussion, please let me know and I will try.
>>>
>>> Best regards
>>> Georg
>>>
>>>
>>> PS.: Please forgive and correct me if I mixed up technical
>>> terms...
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
More information about the Idna-update
mailing list