Bundling

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Mon Dec 7 13:02:33 CET 2009


Hello Shawn,

I think with respect to bundling, ß and ς are quite different, as follows:

ς:
1) ς/σ distinction (virtually?) never a distinction of meaning, only 
contextual.
2) Need for bundling limited to registries/zone operators allowing Greek.
[3) Potentially needed soon for cypriot IDN TLD]

ß:
1) ß/ss distinction actually significant to distinguish between certain 
words (and especially names)
2) "ss" substring essentially used/usable in every registry/zone around 
the world.

[I hope somebody else can provide details on ZWJ/ZWNJ for point 1); it's 
clear that for point 2), they are more like ς than like ß.]

This suggests to me that for ς, we can go with IDNA 2008 and bundling 
immediately, without the need for TR46. (Even in the long term, we may 
not get rid of bundling because Greeks seem to care a lot about 
all-uppercase.)

It suggests that ß is much tougher, because we essentially have a choice 
between giving up and staying with the half-baked situation that we have 
now, and doing the right thing in the long run. Both of these choices 
are clearly suboptimal.

Regards,   Martin.


On 2009/12/03 10:38, Shawn Steele wrote:
> I’m using “bundling” loosely, assuming that it’d maybe be a BCP and that domain holders would try to do bundling even if the registrar didn’t.  As you point out that becomes problematic with places like blogspot.
>
> I also mean that “bundling” is interesting even beyond a transition period.  The problem is highlighted by the IDNA2003 mappings, making these “special”, but problems exist anyway.  For Greek, assuming casing, people would expect an upper case string to resolve to a lower case domain.  Depending on the mapping chosen for upper case sigma, that may not be the expected form.  Even with special logic there are exceptions.  So “bundling”, even if done merely by registering both names, becomes an interesting workaround.
>
> If bundling (whether registrar or otherwise) is a common solution to the problem of similarities, then TRANSITIONAL may be less interesting because the bundling would end up “looking like” TR46, even after the transition period, in which case your option a) seems more interesting than b).
>
> To be clear, I’m brainstorming, trying to get at the problem from a different direction.  The brainstorming hypothesis is that “bundling” is likely interesting, at least in these cases, regardless of whether the characters ever were allowed in IDNA2003.  If true, is there a way to codify that idea so that it also addresses the position we find ourselves in?  On the other hand, if everyone thinks bundling of these cases is silly, then that’s a different problem.
>
> If bundling is interesting, then TR46’s approach be an interesting solution.  Both variations are legal per IDNA2008 (addressing the what-it-looks-like-in-a-Certificate problem), but TR46 forces lookup to an IDNA2003 compatible variant.  That variant could then advertise itself by an IDNA2008 compliant name, but the variant(s) get you there too.  For layers that don’t need mapping (Certificates I’m guessing), then TR46 need not be applied.
>
> It seems worth thinking about if that solves the need for uniqueness necessary for some applications of IDNA, and also the need for mapping/lookup required by browser type applications?
>
> -Shawn
>
> From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
> Sent: ,  02,  2009 13:14
> To: Shawn Steele
> Cc: Lisa Dusseault; Andrew Sullivan; Eric Brunner-Williams; idna-update at alvestrand.no
> Subject: Re: Bundling
>
> The problem with bundling is that everyone has to do it, otherwise links don't work as expected. The problem isn't limited to .DE, .AT, and .GR.
>
> Take these links:
>
> Schloß Schönbrunn - http://xn--schloss-schnbrunn-9zb.blogspot.com/
> Schloß Schönbrunn - http://xn--schlo-schnbrunn-uib90b.blogspot.com/ - not mapping ß
>
> Unless the registry maintained by blogspot also bundles, these will have the transition problem. And there are gazillions of registries: for example, Schönbrunn.de ( = xn--schnbrunn-27a.de<http://xn--schnbrunn-27a.de>) is the registry for:
>
> Schloß.Schönbrunn.de<http://schloss.xn--schnbrunn-27a.de>  (= xn--Schlo-pqa.xn--chnbrunn-9zb.de<http://xn--Schlo-pqa.xn--chnbrunn-9zb.de>) and
> Schloss.Schönbrunn.de<http://Schloss.xn--schnbrunn-27a.de>  (= xn--Schloss.xn--chnbrunn-9zb.de<http://xn--Schloss.xn--chnbrunn-9zb.de>)
>
> That's why I see the choice as either:
>
> a) the display processing that Microsoft suggested (what is in UTS46).
> b) a workable transition plan (perhaps something like the TRANSITIONAL policy)
>
> Mark
>
> On Wed, Dec 2, 2009 at 11:17, Shawn Steele<Shawn.Steele at microsoft.com<mailto:Shawn.Steele at microsoft.com>>  wrote:
> I guess one thing that bothers me about "those 4 characters" is that most of the "problems" with making them PVALID can be fixed by bundling.  In fact we've heard .de and .at say they want to bundle Eszett.
>
> Bundling is obviously interesting for any back-compat/transition.  We also know why bundling is interesting for Eszett.
>
> It's maybe also interesting for Final Sigma in case something's lower cased.  Thinking of a CamelCased word.  Someone also mentioned that there are other shortcuts people make typing Greek, which could cause additional bundling.
>
> For ZWJ/ZWNJ bundling might be less interesting, except for compatibility?  I don't know enough about the languages except that these are required for display.
>
> The one thing that's consistent with a bundling approach is that the "bundling" effectively causes an effect like mapping.  The difference is that the bundler has some control over the priority of the names in the bundle (eg: they can prefer a display form, although user entry or something else might not let them have complete control of display.)
>
> So if that's how the problem will be solved, is there a better way to state it?  Or should bundling in these cases just be a BCP?
>
> -Shawn
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no<mailto:Idna-update at alvestrand.no>
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp


More information about the Idna-update mailing list