Eszett and IDNAv2 vs IDNA2008
mark at macchiato.com
Thu Mar 12 23:54:03 CET 2009
On Thu, Mar 12, 2009 at 14:34, Georg Ochsner <g.ochsner at revolistic.com>wrote:
> I think, maybe you are making things about Eszett too complicated...
> My perception is:
> - "ß" and "ss" are linguistically two different things.
true: and so are "Polish" and "polish", or "therapist" and "the rapist" -
neither of these differences can be directly represented in domain names.
> - Many people do now think that the mapping in IDNA2003 was a (big)
> mistake, which can be corrected now.
It may or may not have been a mistake. Having the uppercase of a string map
to a different place than the original is a bad thing, in many peoples'
minds. I think the real problem is not that "ß" and "ss" have the same
canonical form, it is that the preferred display form for a given string is
not maintained by the Punycode encoding.
And the "can" is at issue. It certainly can be done, but the cost is not
insignificant. At least some people are very worried about the compatibility
and security issues.
> - There is consensus to make ß PVALID in IDNA2008.
I don't think the situation is that clear-cut: see Marcos's mail.
> - In the future domains with ß and ss should be autarchic domains in the
Not sure what you mean by "autarchic". Do you mean "separate"?
> - As a registrant it can, but not necessarily must be interesting to have
> two domains, that just vary in ß and ss. (e.g. buße.de <http://busse.de>means
> penance.de where busse.de means busses.de in English - two completely
> different meanings)
Yet somehow the Swiss manage to understand busse with both meanings, and all
Germans manage with BUSSE having both meanings. When I've asked for
examples, the number of cases where there are two distinct meanings appears
to be extremely small; any ambiguity introduced is orders of magnitude
smaller than ambiguities introduced by omitting spaces between words, for
If you want to give some data as to the percentage of German words that are
distinguished in meaning by ß and ss -- and of course omitting those
affected by the latest spelling reform, which caused the preferred display
form to shift from one to the other.
> - Therefore the registries can make up their minds if they offer a sunrise
> period or bundling or something else or nothing at all when introducing ß -
> or just not to introduce ß.
> - If a registrant owns both domains (with ß and ss) he can easily decide
> which one he redirects to the other, hence which one will be shown and stay
> in the browsers address bar. (browser as an example application)
It is really the cost of making this very incompatible change that is a
problem. And clearly not all registries will always map ß and ss together,
if they are separated in IDNA2008. Remember also that we have registries at
every level, eg xxx.blogspot.com
> - In applications during a transition phase in order to prevent fraud or
> serious confusion there could be some mechanism which tells a user that he
> entered a domain with ß which can be treated in two ways: mapped to ss (like
> in IDNA2003) or looked up as ß (like possible since IDNA2008). The user
> decides what to do and can even store his answer for all future ß lookups.
The magnitude of converting all client software to support this I think is
seriously underestimated -- and the timeframe for any transition.
> What also comes to my memory:
> - Many applications including browsers do not even support IDNA2003 (and ß)
> - When German Umlauts (ä, ö, ü) were introduced together with IDNA2003
> there was a quite similar challenge for the registries regarding existing
> domains with "ae", "oe" and "ue". They dealt with it of course.
> - If you try to look up a WHOIS record at Denic for a domain spelled with ß
> (e.g. süßes.de <http://xn--ssses-kva.de>) you will get an error message
> that the domain is not valid. The registrant who registered süsses.de<http://xn--ssses-kva.de>never had two domains, people just had the option, that in some applications
> (featuring INDA2003) they were redirected from süßes.de<http://xn--ssses-kva.de>to his domain instead of getting an error.
The number of lookups with WHOIS are dwarfed by the number of DNS lookups
that start with ß.
> Best regards
> -----Ursprüngliche Nachricht-----
> Von: idna-update-bounces at alvestrand.no [mailto:
> idna-update-bounces at alvestrand.no] Im Auftrag von Erik van der Poel
> Gesendet: Donnerstag, 12. März 2009 18:42
> An: IDNA update work
> Cc: Shawn Steele (???)
> Betreff: Re: Eszett and IDNAv2 vs IDNA2008
> If the registrant would prefer the name to be displayed with Eszett no
> matter which way the user typed the name, they would want some
> mechanism to indicate the preferred display.
> On Wed, Mar 11, 2009 at 10:59 PM, Adam M. Costello
> <idna-update.amc+0+ at nicemice.net.removethisword> wrote:
> > "Shawn Steele (???)" <Shawn.Steele at microsoft.com> wrote:
> >> In my view the real problem comes when I don't know what the preferred
> >> display form is supposed to be.
> > You mean you have received a mapped label (either ACE or mapped &
> > normalized Unicode) and you want to display it. Can you give example
> > scenarios? How did you receive the label, why do you want to display
> > it, and why did the sender give you a mapped label instead of a more
> > display-friendly label? I'm sure this happens, but I want to understand
> > how common or uncommon it will be.
> > The example I'm already familiar with is email applications displaying
> > headers, and the problem is being address by recent specs for UTF-8 mail
> > headers, which will (I think) enable the sender to leave the domain
> > names unmapped in the header, just as the user typed them.
> > Thanks,
> > AMC
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> Idna-update mailing list
> Idna-update at alvestrand.no
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update