Shawn.Steele at microsoft.com
Mon Dec 7 19:45:45 CET 2009
For ß/ss the ability to present the correct form is important, especially in names. I think the ability to make a distinction is marginal (have both go to different places). In practice it seems clear that both would be "bundled" by the name owner. In fact the major registrars where this is interesting (.de and .at) have indicated that they WILL bundle, at which point it's moot whether this WG expects both forms to go to different places since they won't. It seems to me to not be helpful to spend a lot of effort to build a feature that the target market won't use.
That doesn't mean that I think IDNA2003 already "solves" eszett. Clearly spelling a word "ss" just because it's convenient for mapping isn't correct.
So maybe Greek and German are similar in that respect. In Greek the distinction may not be as common, though I would argue there is some distinction of meaning: A CamelCased label where the end of the first word was a sigma would be clearer. I don't know Greek, but I'd guess there's at least one "Therapist" case where it could be a single word if the final sigma was missing.
FWIW eszett is also only interesting for registries/zone operators using German. (almost like Sigma is only interesting for Greek). The slight (IMO) difference is that the non-eszett form is common in other languages.
I think a key point is that the TLD registrars in all three of these have made clear that the ability to register the "correct" linguistic form is very important, however they will also bundle. I think that's the most functionality we need to allow. Anything more detailed gets lost by the behavior of those registrars. (Sure, lower level zones could do something else, but 90% of the users caring about these characters will encounter the TLD behavior and expect that, even at lower levels.)
From: "Martin J. Dürst" [mailto:duerst at it.aoyama.ac.jp]
Sent: , 07, 2009 4:03
To: Shawn Steele
Cc: Mark Davis ☕; Andrew Sullivan; Eric Brunner-Williams; idna-update at alvestrand.no; Lisa Dusseault
Subject: Re: Bundling
I think with respect to bundling, ß and ς are quite different, as follows:
1) ς/σ distinction (virtually?) never a distinction of meaning, only contextual.
2) Need for bundling limited to registries/zone operators allowing Greek.
[3) Potentially needed soon for cypriot IDN TLD]
1) ß/ss distinction actually significant to distinguish between certain words (and especially names)
2) "ss" substring essentially used/usable in every registry/zone around the world.
[I hope somebody else can provide details on ZWJ/ZWNJ for point 1); it's clear that for point 2), they are more like ς than like ß.]
This suggests to me that for ς, we can go with IDNA 2008 and bundling immediately, without the need for TR46. (Even in the long term, we may not get rid of bundling because Greeks seem to care a lot about
It suggests that ß is much tougher, because we essentially have a choice between giving up and staying with the half-baked situation that we have now, and doing the right thing in the long run. Both of these choices are clearly suboptimal.
On 2009/12/03 10:38, Shawn Steele wrote:
> I’m using “bundling” loosely, assuming that it’d maybe be a BCP and that domain holders would try to do bundling even if the registrar didn’t. As you point out that becomes problematic with places like blogspot.
> I also mean that “bundling” is interesting even beyond a transition period. The problem is highlighted by the IDNA2003 mappings, making these “special”, but problems exist anyway. For Greek, assuming casing, people would expect an upper case string to resolve to a lower case domain. Depending on the mapping chosen for upper case sigma, that may not be the expected form. Even with special logic there are exceptions. So “bundling”, even if done merely by registering both names, becomes an interesting workaround.
> If bundling (whether registrar or otherwise) is a common solution to the problem of similarities, then TRANSITIONAL may be less interesting because the bundling would end up “looking like” TR46, even after the transition period, in which case your option a) seems more interesting than b).
> To be clear, I’m brainstorming, trying to get at the problem from a different direction. The brainstorming hypothesis is that “bundling” is likely interesting, at least in these cases, regardless of whether the characters ever were allowed in IDNA2003. If true, is there a way to codify that idea so that it also addresses the position we find ourselves in? On the other hand, if everyone thinks bundling of these cases is silly, then that’s a different problem.
> If bundling is interesting, then TR46’s approach be an interesting solution. Both variations are legal per IDNA2008 (addressing the what-it-looks-like-in-a-Certificate problem), but TR46 forces lookup to an IDNA2003 compatible variant. That variant could then advertise itself by an IDNA2008 compliant name, but the variant(s) get you there too. For layers that don’t need mapping (Certificates I’m guessing), then TR46 need not be applied.
> It seems worth thinking about if that solves the need for uniqueness necessary for some applications of IDNA, and also the need for mapping/lookup required by browser type applications?
> From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
> Sent: , 02, 2009 13:14
> To: Shawn Steele
> Cc: Lisa Dusseault; Andrew Sullivan; Eric Brunner-Williams;
> idna-update at alvestrand.no
> Subject: Re: Bundling
> The problem with bundling is that everyone has to do it, otherwise links don't work as expected. The problem isn't limited to .DE, .AT, and .GR.
> Take these links:
> Schloß Schönbrunn - http://xn--schloss-schnbrunn-9zb.blogspot.com/
> Schloß Schönbrunn - http://xn--schlo-schnbrunn-uib90b.blogspot.com/ -
> not mapping ß
> Unless the registry maintained by blogspot also bundles, these will have the transition problem. And there are gazillions of registries: for example, Schönbrunn.de ( = xn--schnbrunn-27a.de<http://xn--schnbrunn-27a.de>) is the registry for:
> Schloß.Schönbrunn.de<http://schloss.xn--schnbrunn-27a.de> (=
> b.de>) and Schloss.Schönbrunn.de<http://Schloss.xn--schnbrunn-27a.de>
> That's why I see the choice as either:
> a) the display processing that Microsoft suggested (what is in UTS46).
> b) a workable transition plan (perhaps something like the TRANSITIONAL
> On Wed, Dec 2, 2009 at 11:17, Shawn Steele<Shawn.Steele at microsoft.com<mailto:Shawn.Steele at microsoft.com>> wrote:
> I guess one thing that bothers me about "those 4 characters" is that most of the "problems" with making them PVALID can be fixed by bundling. In fact we've heard .de and .at say they want to bundle Eszett.
> Bundling is obviously interesting for any back-compat/transition. We also know why bundling is interesting for Eszett.
> It's maybe also interesting for Final Sigma in case something's lower cased. Thinking of a CamelCased word. Someone also mentioned that there are other shortcuts people make typing Greek, which could cause additional bundling.
> For ZWJ/ZWNJ bundling might be less interesting, except for compatibility? I don't know enough about the languages except that these are required for display.
> The one thing that's consistent with a bundling approach is that the
> "bundling" effectively causes an effect like mapping. The difference
> is that the bundler has some control over the priority of the names in
> the bundle (eg: they can prefer a display form, although user entry or
> something else might not let them have complete control of display.)
> So if that's how the problem will be solved, is there a better way to state it? Or should bundling in these cases just be a BCP?
> Idna-update mailing list
> Idna-update at alvestrand.no<mailto:Idna-update at alvestrand.no>
> Idna-update mailing list
> Idna-update at alvestrand.no
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update