Interoperability Requirements - was RE: Stop me if I've misunderstood...

Shawn Steele Shawn.Steele at microsoft.com
Fri Jul 10 01:05:21 CEST 2009


> There was rough consensus that not requiring mapping was a big enough
> improvement for the new protocol to make it the rule.

However the "consensus" was based on the idea that mapping would be permitted with some flexibility.  Currently mapping-01 restricts it very narrowly to the "UI layer".  With that language we no longer have consensus.  Specifically I need to be able to map an href in an HTML file.

>> The browser manufacturers would, I can fairly confidently state, be very
>> keen to make this interoperable.

> The WG would be keen for the browser vendors to define what "interoperable"
> means here.

The browser vendors have been speaking here, as individuals, since "vendor" participation doesn't happen in IETF forums.

Since I was nicely asked, I will now speak for Microsoft.  Internally our IDN discussion alias contains 107 members representing Internet Explorer, Windows, Office, Windows Live, Microsoft .Net, DNS, IIS, etc.  There has been no disagreement about the best practices I have been suggesting here.

If you want more formal requirements you'll have to give me a few days to run it by the others, this is sort of off the cuff.  (Requesting requirements from the vendors and others might not be a bad idea. http://www.ietf.org/html.charters/eai-charter.html could even point to requirements/scenarios docs, thus avoiding the "didn't you read the archives?" mails.)

What we (Microsoft/Internet Explorer) mean by interoperable is loosely this:

1) IE shouldn't need to know the details of IDN.  If it does, then so does IIS and Outlook and Word and, and, and...  Additionally if something changes, then IE breaks.  What IE needs is effectively to call GetAddrInfoW() and get an address for a Unicode name.  Since names may be in unmapped Unicode, or Punycode, IE may also need something like GetHostDisplayName() or something.  Again, getting a name into a canonical form should be System API magic that client apps like IE shouldn't have to worry about.

2) When any user sees a URL on the side of the bus, and types it in to a computer, they will go to a unique web site.  Even if they saw the bus in a Japanese ad in a Turkey webcast on a Mac before flying back to the US and typed it on their Windows Vista PC.

3) IME differences and user expectations mean that mapping is required, users will not enter lower case U-labels in Unicode Normalization Form C, nor should they be expected to even know there's a difference.  Users in this case includes posters of blogs, editors of http, and authors of email.  Requirement: Mapping is required.

4) Every user should end up at the SAME web site given the same input string.  Requirement: Mapping must be standardized.

5) Applications must have the flexibility to map imperfect labels to U-labels whenever the form is suspect, such as in an href.  Those could be from a blog or from an email or other source that didn't validate human input very thoroughly.  E.g.: Please don't use the term "UI Layer".

6) We expect IDNAbis and IDNA2003 to coexist.  IDNAbis MUST support IDNA 2003 behavior or our customers will whine at us (not the WG) when things break.  To be clear I don't expect an IDNA2003 browser to find an IDNAbis named web site, but I do expect a IDNAbis browser to find an IDNA2003 web site.  (FWIW: Expect that we'll do whatever bis/2003 mapping fallback needs to happen in a single step even if we have to merge tables ourselves, though we'll respect the "don't map a valid u-label to a different valid u-label".  That's an optimization, not trying to avoid preferring IDNA2008.  I'd much prefer any mapping tables in the standards already merge IDNA2003 for me, 'cause it'd be easier and I'd be less likely to mess it up.)

7) We'd prefer no IDNA2003 breaking changes because SOME user is going to hit them and complain to us.

8) What's supposed to happen when someone uses a browser on a UTF-8 aware Intranet/private network?  Most modern Intranets are 8-bit clean, and our products have been interoperating with other vendor's products in that type of environment for some time.  We expect corporate environments to continue having UTF-8 Intranet DNS systems that coexist with IDN Internet for the foreseeable future.

> Which two parties are interoperating?

Hopefully any web page, email link on any OS, browser, email provider, with IDNA2003 or IDNAbis.  In many scenarios there may be more than two parties.  I'm less concerned about the specific bus company or regional transportation district was carrying the advertisement, however I'd like to hope that those would have little impact ;-)

I would expect that our competitors, some of whom also participate in this forum, would agree with most, if not all, of these requirements.  (Though I would hope that most professionals would take more time on the list than I have ;-)  I am a bit confused though about one particular competitor that seems to be taking conflicting positions :)  Certainly if Firefox or Chrome or Safari or Opera or Apache or OpenOffice or whatever has conflicting requirements, it'd be nice to know before it shows up as an interoperability bug :)

-Shawn

Shawn Steele
Senior SDE
Windows International
Microsoft



More information about the Idna-update mailing list