Additional thoughts on TRANSITIONAL

Shawn Steele Shawn.Steele at microsoft.com
Fri Dec 4 20:25:39 CET 2009


Again, I'm very concerned about things that currently "work" and start breaking.

ALL of these domains currently resolve to a server in IDNA2003.  The source link may even look fine (ZWJ/ZWNJ are required to display correctly, but they're dropped by mapping in 2003, so you could actually type an http://whateverZWNJthatZWNJneededWZNJthem, and it'd get to a server).  IE wouldn't display it right in the address bar, but at least you'd get there.  Ditto with sigma, and, of course, eszett.  FF and all the others behave similarly.  I suppose we could even add logic to IE to not muck with the input form of these.  

Agreed that it's not "perfect, or even close", but it actually works in IDNA2003.

Our servicing doesn't allow machine-readable updates for all data right now.  So our software won't be able to magically turn on at some future date.  Instead, we'd have a painful transition from "sort of works and might look ugly" to "doesn't work at all", then again to "now it finally works".  That's probably like a decade at least.  Any sooner for each step and you'd end up in with different clients of different behaviors.

The core problem here isn't that I can't type my domain and have it go where I want.  The problem is that we lose the information about the difference between the characters so when I get there it ends up looking very bad in some cases.  If ALL domains looked like askdjfoaiwuerh after lookup (or maybe 207.46.197.32), then I don't think we'd be having this discussion.  The problem is that we're using a human readable string as this hash value, and then we're unhappy when it misbehaves.

So is there any mechanism that could help us make that distinction and figure out the correct version of these values to use?  It doesn't even have to be "quick", just quicker than 10 years.

Additionally, IF we chose to address the presentation aspect of the problem, then we could also address numerous other presentation problems at the same time.  English users are currently happy, DNS is "case preserving", so they can get Microsoft.com or AAA.com or CamelCase.com if they want.  Other languages, even Latin non-ASCII, don't get that benefit.  It seems likely that being able to specify a "correct" form would be interesting for Turkish i and other cases as well.  However we could limit it to these 4 characters.

I can't get very excited about transitional strategies that make almost-working stuff break for several years.

-Shawn

PS: More data about the timeframe.  I'll let you fill in numbers from public information about different vendors' release cycles.

* Current Operating System and Browsers have already shipped.
* Even if the WG decided the perfect behavior today, it might not happen until those OS's or Browsers are upgraded.  Even in an SP that is a significant delay for some properties.  IDNA2003 took a year? but it was lucky in some products' release cycles.
* Then we have to allow for adoption to reach critical mass.  Patches aren't always taken by users, and if you look at OS/Browser usage on the web there's still lots of old stuff out there.  That is probably many years.

Transition then requires we do the whole thing all over again.  If we had some warning about the process it might be a little easier to trigger, but each change is a long, painful thing.

________________________________________
From: idna-update-bounces at alvestrand.no [idna-update-bounces at alvestrand.no] on behalf of Erik van der Poel [erikv at google.com]
Sent: Friday, December 04, 2009 4:11 AM
To: idna-update at alvestrand.no
Subject: Re: Additional thoughts on TRANSITIONAL

Here is another proposal that is dead simple, yet allows
implementations to take advantage of a machine-readable file, and does
not involve "flag days" (dates at which we change something).

Instead of having a machine-readable file at each host, we have two
global files at iana.org. One file is similar to Patrik's table with
entries like:

00DF       ; DISALLOWED  # LATIN SMALL LETTER SHARP S
03C2       ; DISALLOWED  # GREEK SMALL LETTER FINAL SIGMA
200C       ; DISALLOWED  # ZERO WIDTH NON-JOINER
200D       ; DISALLOWED  # ZERO WIDTH JOINER

There is no new value called TRANSITIONAL. The infamous 4 characters
(above) start with the value DISALLOWED. Later, we change them to
PVALID (or CONTEXTJ for 200C/200D). We encourage ICANN to redelegate
TLDs the registries of which flout our rules.

The other file is for global mappings. Not language-specific mappings.
The format might be similar to RFC 3454's:

0041; 0061; Case map
00AD; ; Map to nothing

The absence of a character from this file means that there is no
mapping for that character. It maps to itself. The infamous 4
characters would not be in this file. In other words, their mappings
are removed, and new clients must stop mapping them. We encourage IANA
to set up a Hall of Shame, with a list of clients that flout our
rules.

Clients are permitted to check these machine-readable files once a
week, not at a fixed time on a fixed day of the week. Implementers are
not required to have their clients automatically check the files. They
may check them manually, and adjust their implementations as soon as
they can.

Voila. Dead simple. Machine-readable files. No flag day.

However, I would encourage .de and .at registry folks to take a closer
look at the .gr registry's claims that DNAME is not good enough for
email, etc. If DNAME is not changed to include the root of the subtree
or if no new xNAME is defined for that purpose, we may decide to keep
Eszett DISALLOWED and add a mapping to ss.

Most of the IDNAbis drafts can be published unchanged. We'd have to
change Patrik's draft for the infamous 4 characters. We might want to
drop the mappings draft for now.

Erik
_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list