Fixing problems with names going the same place
Shawn.Steele at microsoft.com
Sat Dec 5 00:45:28 CET 2009
I had this thought, so I may as well share it. Go ahead and shoot me down. Someone will, it may as well be you :) I'm hoping it's good enough someone can improve on it.
I think the main problem with Eszett isn't that it is bundled with ss, but rather that it changes to ss. I've tried to ask that specifically before and the only objection was "It may be annoying to me if I own fuss.com and don't realize fuß.com goes there as well under IDNA2003." I believe we're in agreement that if you want Fußball, the system shouldn't make it fussball for you. OTOH if you want Fussball, then the system shouldn't turn it into fußball.
I also think there isn't really a strong need for these two forms to go to different servers. I've also tried to ask that and haven't heard any objection. (Yes, there are somewhat rare cases they could collide, but the concensus seems to be they wouldn't be registered independently)
So it would seem that bundling would work except for about 4 things:
* There aren't any cops that can make registrars bundle;
* There are issues with DNAME. (But it seems some are edge cases or could be worked around?)
* Bundling ala IDNA2003 still perverts your ß into an ss, which everyone agrees is bad.
* Doing it ala UTR46's current behavior could still turn ß into ss.
So the idea I'm toying with isn't perfect, but maybe it'd be worth thinking about. The thought is to bundle, preserve the form, and be a cop for the registries.
* Allow these characters to be PVALID since that's necessary.
* Enable a "BUNDLED" type through UTR46. These characters (both forms) would be "BUNDLED" and PVALID.
* For the "BUNDLED" characters, provide a "PLACEHOLDER" value as a token representing the bundling. For compatibility, the placeholder would be the IDNA2003 mapping.
Then for UTR46 Processing, step 1a)
* For canonical and presentation processing do nothing.
* For lookup, for each code point listed in the BUNDLED table, replace the BUNDLED character with it's PLACEHOLDER form.
Badly Formatted Table of BUNDLED sequences:
U+0073 U+0073 U+0073 U+0073
U+00DF U+0073 U+0073
Registrants would need to register both the BUNDLED form that they intended to use, and the PLACEHOLDER form (if applicable). This requires "bundling" of 1 or 2 labels. Any others won't work. Preferrably, the registrar would bundle them, but now we have some recourse if they don't. Browsers would apply the bundling replacement step in 1a) above.
What the resulting impact would be:
* When used in a canonical and presentation form, there is little impact, both are PVALID and work as expected.
* When correctly registered and viewed by an IDNA2003 compliant system, the browser will discover the PLACEHOLDER form and the server should be found.
* When correctly registered and viewed by an IDNA2008 browser, it should also find the PLACEHOLDER form.
The interesting part is what happens with the registry:
* If properly registered and bundled, it works as designed.
* If a zone administrator doesn't make the pairing there are a few possibilities:
* If BUNDLED label is also the PLACEHOLDER form it still works.
* If the BUNDLED form is not the PLACEHOLDER and the PLACEHOLDER isn't provided as well, then the domain won't be resolved by the clients and I ask for my money back. Either the zone administrator decides to support the scenario and fixes the situation, or they decide that this isn't their target market and allow it to fail.
* If the PLACEHOLDER was already assigned to another party, then again, it won't work (but in a different way), and I ask for my money back. Either the zone administrator fixes the problem or decides not to support it.
The upside is:
1) Still allows the "correct" form of the string.
2) Works really soon, not 5 years from now!
3) IDNA2003 compatibile.
4) Incidental: Swiss users can still find German sites (more coincidence that design, there are still other issues like colour and color so this can't be counted as a deciding factor).
5) Also coincidentally helps with the Greek case mapping problem since at least the site would resolve if UTS46 was applied to an uppercase string. Not a major point, but likely "nice to have."
6 Feedback loop to help encourage Bundling.
7 Sites can never be "poached" (like someone registering fußball.de and stealing fussball.de's traffic)
8 We never get to the "wrong" site. The only odd cases are when the zone operator isn't cooperative:
a) If we register a site and can't get the PLACEHOLDER, then it won't work and we never use it. Hopefully we ask for our money back!
b) If we own the PLACEHOLDER, then someone else may try to register a BUNDLED form that ends up sending people to our site, but they'll never poach our traffic.
The downside is:
1) The two PVALID labels are grouped together (by design) and cause a special case.
2) Some zone operators will still not adopt the bundling, and the proper behavior won't work in those zones. (Mitigated by the idea that they must not care).
3) Zone operators that do care may have to do something to enable bundling. (Eg: blogspot.com).
4) Browsers have to be smart enough to figure out the correct form for presentation.
5) These can't ever point to distinct sites. My understanding is that the need for that functionality is rare.
6) Techical problems with bundling? (DNAME isn't perfect)
More information about the Idna-update