mark at macchiato.com
Thu Mar 19 21:31:28 CET 2009
First off, I'm hopeful that at least one of the directions that Erik is
exploring will work out. If we can give browser, etc, a way to show the
preferred representation, then we can get out of security problem of
IDNA2008 domains going to different IP addresses than IDNA2003 names for the
four characters. Maybe something like the favicon.ico approach would work
(eg http://mail.google.com/favicon.ico), at a level above the DNS.
As far as eszett goes, a key issue is that IDNA is not guaranteed to make
all distinctions that written language has. There has been no uproar about
the fact that IDNA2008 disables many, many names in English (especially
Irish), French, Italian, and others. For example, all of the following are
currently allowed in IDNA, but would be disallowed under IDNA2008.
- Elizabeth’s crown
These are all represented with the
U+2019<http://unicode.org/cldr/utility/character.jsp?a=2019>( ’ )
RIGHT SINGLE QUOTATION MARK = curly apostrophe in IDNA2003. Here
is an example of working currently: http://Mark’s-Grill.blogspot.com.
(Firefox isn't able to show this correctly, but Safari, Chrome, and IE all
The apostrophes are disabled in IDNA2008 by:
200E..2064 ; DISALLOWED # LEFT-TO-RIGHT MARK..INVISIBLE PLUS
BTW, a few comments on Georg's message:
On Fri, Mar 13, 2009 at 01:40, Georg Ochsner <g.ochsner at revolistic.com>
> > 1.)
> > - "ß" and "ss" are linguistically two different things.
> > true: and so are "Polish" and "polish", or "therapist" and "the rapist"
> > - neither of these differences can be directly represented in domain
> > names.
> That kind of comparison is not getting truer by repeating it. ß is not
simply the lowercase of SS or vice verse. ß used to have no uppercase (in
Unicode), now IT HAS.
I well understand this; I'm president and co-founder of the Unicode
Notwithstanding the introduction of the capital, which was done because
there are some (but currently few) attested instances, the German national
body's position remained that the normal uppercase of ß is SS.
> Regarding your second example do you mean that therapist.com should be
bundled with the-rapist.com?? Or as another idea should wwwapple.com be
bundled with apple.com, because it is a very common typing error?
You misunderstand. The point is that IDNA does not, and cannot, guarantee
that all distinctions possible in human languages are allowed in IDNAs.
> > - Many people do now think that the mapping in IDNA2003 was a (big)
> > mistake, which can be corrected now.
> > It may or may not have been a mistake. Having the uppercase of a string
> > map to a different place than the original is a bad thing, in many
> > peoples' minds. I think the real problem is not that "ß" and "ss" have
> > the same canonical form, it is that the preferred display form for a
> > given string is not maintained by the Punycode encoding.
> > And the "can" is at issue. It certainly can be done, but the cost is not
> > insignificant. At least some people are very worried about the
> > compatibility and security issues.
> Sure the WG must address the compatibility and security issues, but making
ß PVALID is a big gain. People in the future will be able to freely choose
which domains they use, just like they can decide to use ß or ss when they
are writing. I think people are intelligent enough to deal with ß in domains
if they deal with it in everyday's life. That's nothing the protocol must
dictate. People nowadays can also deal with similar issues e.g.
www.whitehouse.org leads to Mr. Bush while www.white-house.org leads to
advertisements. Choose yourself which one you like better.
The problem is compatibility with IDNA2003.
> > - There is consensus to make ß PVALID in IDNA2008.
> > I don't think the situation is that clear-cut: see Marcos's mail.
> There has been a consensus call with a clear outcome.
There was a consensus call with a clear outcome, but the WG has also
explored a lot of new ground since that consensus was arrived at. In
particular, the security and compatibility impact of having a domain name
lookup of D by client X go to a different location was not fully explored.
If we can solve the "preferred display" aspect of IDNA, then it appears that
we can solve the technical aspect. The buße.de <http://busse.de/> domain
could be displayed as buße.de <http://busse.de/>, and yet people could reach
it by typing BUSSE.DE as well.
> > - In the future domains with ß and ss should be autarchic domains in the
> > DNS.
> > Not sure what you mean by "autarchic". Do you mean "separate"?
> Yes, I mean separate by protocol. The registries can solve the rest
(sunrise periods, cloning registrant and NS data etc.) And they have many
native speakers and know best about their local situation.
I just don't think that the magnitude of the issue is being faced. It is a
problem for the entire ecosystem. Registries, but not only top level, but
also the many sublevel registries like blogspot.com; intermediaries, but
also huge numbers of instances of client software. Just look at how many
people are still on IE6.
> > - As a registrant it can, but not necessarily must be interesting to
> > have two domains, that just vary in ß and ss. (e.g. buße.de<http://busse.de>means
> > penance.de where busse.de means busses.de in English - two completely
> > different meanings)
> > Yet somehow the Swiss manage to understand busse with both meanings, and
> > all Germans manage with BUSSE having both meanings. When I've asked for
> Yes, but "somehow" doesn't mean things can't be made better.
We have to consider the cost as well as the benefits. Suppose that we could
show that traffic circulation could be improved by 3% in countries that
drive on the left; it doesn't mean that the US or Germany should switch the
way all their road systems work -- there would be a huge cost involved. The
fact that the Swiss somehow manage (perhaps its that clean Alpine air) to do
without the distinction means that there is a workaround.
We can't distinguish between "Van-Der-Poel" and "van-der-poel" either in
IDNA, but people manage to work around that.
> > examples, the number of cases where there are two distinct meanings
> > appears to be extremely small; any ambiguity introduced is orders of
> > magnitude smaller than ambiguities introduced by omitting spaces between
> > words, for example.
> > If you want to give some data as to the percentage of German words that
> > are distinguished in meaning by ß and ss -- and of course omitting those
> > affected by the latest spelling reform, which caused the preferred
> > display form to shift from one to the other.
> Let me give you thousands of relevant examples at once. I am sure you
agree that surnames are often used as (parts of) domain names e.g.
smith-books.com . Now I queried the German telephone book and this is the
result. 1,5 Mio Germans have a surname with ß. There are over 3'100
different pairs of surnames (2 x 3'100 names) which do only differ in ß and
ss instead. Over 2,5 Mio people (!) have one of these surnames either with ß
or ss. (e.g. Abeßer/Abesser, Ablaß/Ablass, Abstoß/Abstoss ...) I think it
would be the right thing, if Mr. Weiß could register
weiß.de<http://weiss.de>while Mr. Weiss has
weiss.de. For a user looking for the website it is just like looking up his
number in a phonebook, he has to know if he is looking for Mr. Weiß or Mr.
Your numbers are a bit unclear to me. You are saying that 0.4% of German
names in the phonebook are distinguished only by ß vs ss. You then say that
2.5M people have those names, which would be 3% of the German population. So
you are saying that those people are proportionately overrepresented in the
population by 700%? Or do you mean that 2.5M people have names containing
either ß or ss? (Could you point to your data sources also?)
If we didn't have the compatibility problems, separating eszett from ss
would not be an issue. There would still be the issue of casing, but that
would be trumped by the reasonable desire to distinguish them. It is the
compatibility issue that most concerns me.
> Best regards
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update