MAYBE-TRANSITIONAL, a historical tale
Mark Davis ☕
mark at macchiato.com
Tue Dec 8 16:57:24 CET 2009
Here's a modified proposal, a bit rough yet.
Live page: http://www.macchiato.com/unicode/idna/transition-proposal
We would like to have the 4 deviation characters be valid, at some point.
The key problem is that we don't want current URLs in web pages, etc. to go
to two different locations depending on the browser, nor do we want
joe at fußball.com <joe at fu%C3%9Fball.com> to go *sometimes* to
joe at fußball.com<joe at fu%C3%9Fball.com>and
*sometimes* to joe at fussball.com. Even once IDNA2008 is approved, for a long
time a majority of the implementations will still be IDNA2003, so this also
goes for new label registrations during the transition period.
IDNA2008 changes as follows:
The 4 deviation characters get the property PVALID_AFTER_2015
The requirements are:
1. On registration, PVALID_AFTER_2015 is equivalent to PVALID
2. On lookup, PVALID_AFTER_2015 is treated as DISALLOWED up until 2016
Jan 1, 00:00:00 GMT, and treated as PVALID thereafter.
1. Implementations must not map the characters after the switchover
3. Implementations that map the characters before that date, must map
as in IDNA2003.
The goal is to
1. allow the 4 character to become valid, as soon as possible;
2. avoid the 'nightmare' scenario of the same URL going to two
different locations, as much as possible.
Let's see what happens with fußball.xxx over time, where xxx is some
registry (eg .de, .blogspot.com, or others). Background: essentially all
browsers and other major implementations are planning to map for
compatibility. We'll look at browsers, but this also applies to email, etc.
*Early 2010 (just as IDNA2008 is approved)**At this time the world browsers
are 100% IDNA2003*
1. browsers map fußball.xxx to fussball.xxx.
2. registries can start accepting eszett, and should bundle with ss.
3. fußball shows up as fussball in the address bar
1. note: it is only by convention that fussball is seen in the address
bar in this case; a browser could also display fußball, as in UTS46.
1. if the registry bundles, both fußball.xxx and fussball.xxx go to
the same owner.
2. if the registry doesn't bundle, both fußball.xxx and fussball.xxx
go to the same owner.
5. The odd IDNA2008 browser that doesn't map just fails, because ß is not
PVALID; it doesn't take fußball.xxx to a different location than the vast
majority of browsers.
*In 2013*At this time the world browsers are 50% IDNA2003, 50% IDNA2008
1. same as above. No ambiguity in results.
*In 2016 Feb**At this time the world browsers are 1% IDNA2003, 99% IDNA2008*
1. 99% of browsers switch to not mapping fußball.xxx.
2. Registries no longer need to bundle; they can have different owners
for fußball.xxx and fussball.xxx.
3. fußball shows up as fußball in the address bar
1. if the registry bundles, both fußball.xxx and fussball.xxx go to
the same owner.
2. if the registry doesn't bundle, fußball.xxx and fussball.xxx go to
5. The odd IDNA2003 browser that is left goes to the wrong location for
the affected languages; people that use them need to upgrade.
On Tue, Dec 8, 2009 at 01:44, "Martin J. Dürst" <duerst at it.aoyama.ac.jp>wrote:
> I agree with Mark that while there are similarities between MAYBE and
> TRANSITIONAL, there are also huge differences.
> One difference, which Mark has mentioned, is the number of characters
> A second difference is that there would only be one transition from
> TRANSITIONAL to PVALID, not a series of transitions from MAYBE to PVALID.
> A third difference is that MAYBE was essentially saying "we don't have a
> clue now, we may have later". In my understanding (I didn't participate in
> any meeting), one of the main reasons brought by the Unicode side against
> MAYBE was that if it's MAYBE, we can as well look at the thing and decide
> now. For TRANSITIONAL, we may know exactly what we want to do, it just
> doesn't fit into PVALID and DISALLOWED.
> BTW, I don't think that any of the dynamic lookup schemes proposed by
> Andrew or Eric are feasible, they quite are simply overengineered. We need
> something much simpler, even if this temporarily goes against user
> Regards, Martin.
> On 2009/12/05 6:45, Mark Davis ☕ wrote:
>> I agree with you that there are many similarities between the MAYBE and
>> TRANSITIONAL. MAYBE at the time wasn't suitable because it was applied to
>> huge number of characters. However, applying the concept (with a few
>> changes) to these 4 characters for a transitional period is, I think,
>> On Fri, Dec 4, 2009 at 12:40, John C Klensin<klensin at jck.com> wrote:
>> Once upon a time, not really that long ago, there was a proposal
>>> to differentiate what is now PVALID by including MAYBE YES and
>>> MAYBE NO categories. Anyone interested should try to find a
>>> copy of draft-klensin-idnabis-issues-06.txt and earlier. The
>>> general model, in today's vocabulary, was to put characters (and
>>> groups of characters) that we weren't sure about into categories
>>> that would encourage different handling on registration and
>>> looking from characters about which we were more certain, to
>>> permit later reclassification, and to arrange for controlled
>>> transitions. There was consensus for removing those categories
>>> because they made things too fragile, because they would require
>>> that all registries and applications check for updates and
>>> changes frequently (which would be too fragile), and so on.
>>> In practice, the only real difference between MAYBE and the sort
>>> of implied TRANSITIONAL you imply (or the explicit versions
>>> others have suggested) is that MAYBE would have laid out the
>>> "this is likely to change" aspect of the situation more clearly,
>>> while the idea you outline above raises all of the issues that
>>> the WG has discussed about transitions from DISALLOWED to PVALID
>>> (and decided that reclassification should require a catastrophic
>>> If I remember correctly, both you and Mark were at the meeting
>>> at which the decision to drop MAYBE was made and were among
>>> those pushing for that decision, pretty much on the basis
>>> outlined above.
>>> While I don't object to revisiting that general idea -- under
>>> the identification of TRANSITIONAL or otherwise-- if the WG
>>> really feels that it wants to go there and that the old model
>>> might be worth the aggravation that caused it to be dropped the
>>> last time around, I hope that everyone does understand that
>>> TRANSITIONAL, as you and others have described it, is very close
>>> to that old and discarded idea... close enough that we might
>>> even be able to borrow text from documents that are now more
>>> than 18 months old.
>>> p.s. I'm not going to comment at any length on the "global
>>> mappings" part of your proposal because I think everything has
>>> been said already. Having required global mappings is
>>> equivalent to _almost_ having U-label<-> A-label symmetry.
>>> And, of all mappings, "map to nothing" is the worst: while part
>>> of the problem with a mapping between "ß" and "ss" is that one
>>> cannot tell by looking at "ss" afterward whether the registrant
>>> intended "ss" or "ß", one at least knows that "x" or "ab" was
>>> not intended. With "map to nothing", the character that was
>>> eliminated could, in principle, have appeared in any position in
>>> any domain name label.
>>> --On Friday, December 04, 2009 04:11 -0800 Erik van der Poel
>>> <erikv at google.com> wrote:
>>> Here is another proposal that is dead simple, yet allows
>>>> implementations to take advantage of a machine-readable file,
>>>> and does not involve "flag days" (dates at which we change
>>>> Instead of having a machine-readable file at each host, we
>>>> have two global files at iana.org. One file is similar to
>>>> Patrik's table with entries like:
>>>> 00DF ; DISALLOWED # LATIN SMALL LETTER SHARP S
>>>> 03C2 ; DISALLOWED # GREEK SMALL LETTER FINAL SIGMA
>>>> 200C ; DISALLOWED # ZERO WIDTH NON-JOINER
>>>> 200D ; DISALLOWED # ZERO WIDTH JOINER
>>>> There is no new value called TRANSITIONAL. The infamous 4
>>>> characters (above) start with the value DISALLOWED. Later, we
>>>> change them to PVALID (or CONTEXTJ for 200C/200D). We
>>>> encourage ICANN to redelegate TLDs the registries of which
>>>> flout our rules.
>>> The other file is for global mappings. Not language-specific
>>>> mappings. The format might be similar to RFC 3454's:
>>>> 0041; 0061; Case map
>>>> 00AD; ; Map to nothing
>>>> The absence of a character from this file means that there is
>>>> no mapping for that character. It maps to itself. The infamous
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>> Idna-update mailing list
>> Idna-update at alvestrand.no
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update