mappings-01

Shawn Steele Shawn.Steele at microsoft.com
Wed Jul 8 20:32:16 CEST 2009


My crystal ball tends to be unreliable at unexpected times, however:

> Lisa said:
> My crystal ball is probably not as good as yours, but for whatever
> it's worth, here's my muddle-ranking:

> > A. mapping required by IDNA protocol

> This would be a standards-phase muddle because we'd have to come to
> strong consensus on exactly what every mapping would be, now, before
> finishing.  That's a source of pain.  I believe some of the resistance
> to this is also that there are cases (like spidering, verifying site
> access, updating caches, verifying access logs, many are automated/bot
> use cases) where mapping is strongly not desired, where only valid
> input should be accepted and invalid input should be rejected as an
> error rather than fudged.

I agree that this would take forever.  I disagree about spidering.  I think that humans will use whatever works in their hrefs and that bots will try their best to follow them.

> > B. no mapping as part of IDNA docs

I agree that this is a muddle.

> > C. optional, UI-only mapping in IDNA docs

> This would also be a deployment-phase muddle because with an optional
> mapping, some implementations would choose the option not to do it.

I think that it is as bad as A for time frame.  I also think that it disallows the flexibility developers may need to do some sort of mapping somewhere between the user-typing-on-the-keyboard layer and the lookup layer.

> > I think that C is far worse than B. So rather than going down the C route,
> > I'd rather go back to John's original formulation (B).

> So in sum my crystal ball shows B as worse than C though they're
> admittedly all muddles.   I gave my reasoning why; is there something
> I'm missing?

My crystal ball has C as the worst case.  For any application that needs mappings beyond the UI layer, they'd be breaking the RFC.  Search for example displays a, perhaps 2003, label which they have crawled in the search results.  Presumably the results should be in a canonical form even if they were originally a mappable href?

>  I'm pretty confident in the prediction that some
> end-user applications will helpfully do mappings no matter what we
> say, but we could debate that point if that's where we differ.

I'd like applications to be able to behave, and also be compliant.  B is perhaps a muddle, but it allows for IDNA2003 compatible behavior for applications that need it.

Another huge problem with the UI layer is that it requires ALL applications understand ALL of the intricacies of IDN (intricacies that the experts, including myself, on this list, struggle with.)

Prior to IDN, when an application encounters a URL, they generally asked a system API to parse the API and figure out what parts were which.  Then another function can take this URL and get an IP address from it (or open a socket or whatever, there are several models/layers where this happens).  To ask every UI layer to parse the name and search for xn-- or Unicode, or whatnot leads to inconsistent behaviors and confounds user expectations.  They also break if someone 5 years from now decides that we need xo—in 5 years.  Or, more realistically, if we tweak the bidi rules.  Applications need to be able to be blind to IDN.  They get a domain name, they ask for it to be resolved.

Sure, some apps are more careful, however unexpected script, homograph detection, blacklisting or other techniques are IDN-neutral operations.  It doesn’t matter what the protocol layer says so much, those steps are security steps that an app can choose to take regardless of the details of the guts of the DNS system.

Some apps have no need at all to figure out whether a URL is an IDN name or not.  Take notepad, or any plain-text editor.  Surely we can’t expect those apps to validation for IDN or whatever random protocol they ask?  Even a rich text editor cannot.  For example, word detects IDN URLs, however I also use Word to give documents with examples of INVALID IDN names.  I can’t have Word “fix” my names for me.

Some apps have no clue whether a URL came from an API layer or not.  In an address book, maybe the user entered it in my app’s UI directly, maybe I imported it from a mailing list from a text file from the above editor.  You could, perhaps, argue that importing a data file should be considered “UI”, but then why use the term “UI”?  It doesn’t have to be an address book, it could be a log file or any number of things.

The data file or transmission of that file could be dereferenced numerous times so that it is clearly machine-to-machine and not “UI”, yet it may also clearly need mapping depending on the root source.

When I actually send a query to a DNS server, IDN MUST NOT map.  It MUST be  Punycode only, very strict, when I do the actual query to the DNS server.  That’s very clear.  Other applications or protocols MAY also need that kind of rigorous only-one-form-of-IDN-is-legal rigor, and they should certainly be allowed to do so.  Those needs shouldn’t be allowed to determine the needs of unrelated applications that require mapping.  The line between when mapping should happen and when it can’t isn't black and white.

Mapping MUST NOT be expected of a DNS server, eg: on the wire.  That’s completely inappropriate.  (On the wire being restricted to DNS lookup.  The body of this email shouldn’t count even though numerous wires, and perhaps some radio waves were involved.)

Mapping MAY happen somewhere between on the wire and the UI layer.

Mapping SHOULD happen at the UI layer.  Otherwise users get confused.  Despite what draft-protocol says, we can’t reasonably expect to train users to enter URLs in lower case and Unicode Normalization Form C.
Given that the “MAY” areas is a big gray area where we have a hard time getting agreement, and given that the actual mappings themselves will likely cause many more months of discussion even if we solve this issue, I think that dropping the mappings document unblocks the protocol document.

I think part of our problem is a fuzzy definition of the UI layer.  I think some people read that as "pressing a key" and others read "something that may have come from a user/human once apon a time, perhaps last year."

My crystal ball seems to be stored in a soapbox, I had intended a much shorter reply :)

-Shawn


More information about the Idna-update mailing list