Stop me if I've misunderstood...
Shawn Steele
Shawn.Steele at microsoft.com
Sat Jul 11 01:11:27 CEST 2009
Paul replied...
>> I consider the entire system to be wherever domain names are used. That would be the a-Labels on the wire of a query, the APIs helping the client to resolve it. The DNS server providing answers. The server providing services which the URL named, the browser trying to visit a web site, the URL (misnomer), the protocol that contains the URL (http hrefs or whatever, mailto) It includes a yellow sticky note and a bus if that's where the name appears.
> OK, that's a definition we can work with.
Whew... :)
>> IF I restrict the system to merely DNS resolution, then it's much simpler. I get labels, canonicalize them, convert them to punycode and make the query. If I don't restrict it to that, then its "everything" where a name may appear.
> Got it. And I think there is general agreement in the WG that we are not restricting IDNA2008 to simply the DNS resolution system; if we were, we could say "Punycode" and nothing else.
Agreed, I just wanted to make my definition clear by showing kinda the opposite.
>> I said "minimum disruption", not "no disruption." Because the sets of expectations conflict, it is impossible to make a consistent system without some disruption.
> You're going to hate me for this, but you now need to define "minimum".
> One person's definition of "minimum" is quite different than other.
Breaking the fewest user expectations, with the caveat that current practice impacts a user expectation even if in hindsight it may have been better.
* ASCII behavior for ASCII letters (longstanding practice even if not best for Turkish i)
* IDNA2003 mappings unless there is clear case. Again people have different bars. In my view the Eszett change is certainly not a clear case, and the German position seems to support that it is not desirable.
* Linguistic expectations. Casing is generally "ignored" by people with scripts using casing. Diacritics are not (unless you're American, in this case the US expectation loses ;-)
* IME or other conventions. The full-width/half-width mappings are mostly computer artifacts, but they are convenient for mapping and users have grown familiar with them.
* Similarly Form C & Form D is a mostly artificial difference and should be normalized away. Users aren't even aware of a difference.
I think it also depends if people expect round-tripping or not? For example, if I map all 4 I's into the same position, I restrict the possible names only slightly, yet it accommodates both US & Turkish user's expectations. One problem is that it won't round trip to the input form. Another is that I have no clue if that would model would work across all scripts.
- Shawn
More information about the Idna-update
mailing list