Stop me if I've misunderstood...

Shawn Steele Shawn.Steele at microsoft.com
Sat Jul 11 01:11:27 CEST 2009


Paul replied...

>> I consider the entire system to be wherever domain names are used.  That would be the a-Labels on the wire of a query, the APIs helping the client to resolve it.  The DNS server providing answers.  The server providing services which the URL named, the browser trying to visit a web site, the URL (misnomer), the protocol that contains the URL (http hrefs or whatever, mailto)  It includes a yellow sticky note and a bus if that's where the name appears.

> OK, that's a definition we can work with.

Whew... :)

>> IF I restrict the system to merely DNS resolution, then it's much simpler.  I get labels, canonicalize them, convert them to punycode and make the query.  If I don't restrict it to that, then its "everything" where a name may appear.

> Got it. And I think there is general agreement in the WG that we are not restricting IDNA2008 to simply the DNS resolution system; if we were, we could say "Punycode" and nothing else.

Agreed, I just wanted to make my definition clear by showing kinda the opposite.

>> I said "minimum disruption", not "no disruption."  Because the sets of expectations conflict, it is impossible to make a consistent system without some disruption.

> You're going to hate me for this, but you now need to define "minimum".
> One person's definition of "minimum" is quite different than other.

Breaking the fewest user expectations, with the caveat that current practice impacts a user expectation even if in hindsight it may have been better.

* ASCII behavior for ASCII letters (longstanding practice even if not best for Turkish i)
* IDNA2003 mappings unless there is clear case.  Again people have different bars.  In my view the Eszett change is certainly not a clear case, and the German position seems to support that it is not desirable.
* Linguistic expectations.  Casing is generally "ignored" by people with scripts using casing.  Diacritics are not (unless you're American, in this case the US expectation loses ;-)
* IME or other conventions.  The full-width/half-width mappings are mostly computer artifacts, but they are convenient for mapping and users have grown familiar with them.
* Similarly Form C & Form D is a mostly artificial difference and should be normalized away.  Users aren't even aware of a difference.

I think it also depends if people expect round-tripping or not?  For example, if I map all 4 I's into the same position, I restrict the possible names only slightly, yet it accommodates both US & Turkish user's expectations.  One problem is that it won't round trip to the input form.  Another is that I have no clue if that would model would work across all scripts.

- Shawn




More information about the Idna-update mailing list