Changing DISALLOWED (was Re: Reserved general punctuation)

John C Klensin klensin at jck.com
Thu May 1 14:03:49 CEST 2008



--On Thursday, 01 May, 2008 05:25 -0400 Vint Cerf
<vint at google.com> wrote:

> At the risk of prolonging this thread, I am assuming that
> DISALLOWED is a condition that makes sense only for an
> ASSIGNED character and that UNASSIGNED means the code point
> has not been assigned any meaning or character. This suggests
> that anything UNASSIGNED should be rejected at the protocol
> level (no registration... no lookup either?). Wouldn't this
> imply that a new revision of UNICODE that ASSIGNS a previously
> UNASSIGNED character may require reimplementation at protocol
> level of filtering since the previously UNASSIGNED code point
> now has properties that might allow it to be used in IDNs.

More likely just recompilation with new IDNA tables (or
application of the rules to new Unicode tables), rather than
what I would normally think of as "reimplementation", but yes.

> before I did myself deeper into a hole, is this a correct
> statement and do I have the vocabulary right?

Yes and yes.
 
> I am persuaded that (3) below seems the most reasonable path
> but as always I am open to persuasion in other directions.

I believe that (3) is correct, with two qualifications:

(1)  I am not sure that I see quite as much difference between a
DISALLOWED -> CONTEXT change and a DISALLOWED -> Protocol-Valid
change as I think (from the note below) that Mark does.  That
distinction may, in practice, be philosophical rather than
practical and, again in practice, may not mean anything other
than a belief that the bar for changes should be set very high.

(2) As Patrik has pointed out several times in the context or
exception and backward compatibility lists, until we are ready
to define the exact process that permits (3) (or (4)), they are
equivalent to (2).  So, until and unless someone is ready to
make a proposal to precisely define that process, the
distinction between the de facto response of (2) and the
proposed (3) position exists only in theory.

     john


> On Apr 30, 2008, at 10:17 PM, Mark Davis wrote:
> 
>> I think the following misrepresents my position:
>>  "It is that area of flexibility with CONTEXT, especially
>> CONTEXT-OTHER, where my view that "Disallowed" is permanent,
>> with no path (or a very difficult one) out of that category,
>> converges with what I understand of Mark's desire to make
>> migration out of DISALLOWED relatively easy."
>> 
>> I'm not looking to make it easy. I think there are a few
>> possible   positions we could take in IDNAbis.
>> 
>> 1. We say that once DISALLOWED, always DISALLOWED.
>> 
>> This is not a firm promise, because an obsoleting RFC could
>> change   it, but would certainly set a very high bar.
>> 
>> 2. We say that characters can only be removed from DISALLOWED
>> by an   obsoleting RFC.
>> 
>> A slightly lower bar. While it could be changed, it would
>> certainly   be difficult.
>> 
>> 3. We say that characters can only be removed from DISALLOWED
>> by   the committee/mechanism that controls
>> CONTEXT/exceptions, and only   in extremis.
>> 
>> This should, in my view, also be quite difficult; not quite
>> to the   same level as an RFC, but carefully, with sufficient
>> time for   deliberation, with solid consensus by a broad set
>> of experts.
>> 
>> 4. We say that characters can only be removed from DISALLOWED
>> by   the committee/mechanism that controls
>> CONTEXT/exceptions, and but   that committee is not designed
>> to be conservative.
>> 
>> This, I think, would be a very bad choice. My presumption has
>>  always been that the committee/mechanism that controls
>> CONTEXT/  exceptions should be extremely conservative in its
>> changes; that   changes are only made very carefully.
>> 
>> I think #3 would be the best, and #2 acceptable, while #1 and
>> #4   are extremes that could cause problems.
>> 
>> Mark
>> 
>> On Wed, Apr 30, 2008 at 6:02 AM, John C Klensin
>> <klensin at jck.com>   wrote:
>> 
>> 
>> --On Wednesday, 30 April, 2008 04:16 -0700 Vint Cerf
>> <vint at google.com> wrote:
>> 
>> > My naïve assumption is that anything unassigned has the
>> > potential to become assigned so we need to have a state in
>> > which the code point is not allowed for current use but
>> > could be permitted at a later time. Do we have the
>> > semantics to accommodate that? V
>> 
>> Short answer: No.  I presume that is why we are having this
>> discussion.
>> 
>> Longer answer:
>> 
>> While we have concluded that the problems it would cause
>> outweigh the advantages, these areas of uncertainty are a
>> large part of what motivated having MAYBE categories.
>> 
>> I think that putting anything into UNASSIGNED that isn't
>> actually unassigned (i.e., given no code point assignment in
>> the then-current version of Unicode) is looking for trouble.
>> As you point out, such code points have the potential to
>> become assigned.  While one might make some educated guesses
>> from the block context in which the code point is located, we
>> can't predict, with 100% certainty, the properties that a
>> code point will have if and when it is assigned in the future.
>> 
>> So, for a code point that is actually assigned, I think we
>> have only three choices:
>> 
>>        * Allow it, as Protocol-Valid.  For general punctuation
>>        this is, I hope obviously, not a good idea.
>> 
>>        * Disallow it and assume that, if we discover we need
>>        it enough later, we will do whatever drastic revisions
>>        or disaster corrections are required.  Of course, that
>>        sets a very high bar to ever allowing those
>>        characters, but that may not be unreasonable.
>> 
>>        * Assign it to "context required" but do not assign a
>>        rule.   Under the current proposed model, that means
>>        that it can neither be registered nor looked up.  On
>>        the other hand, we could, in the future, allow it in
>>        the cases where it is actually required by assigning an
>>        appropriate rule and then waiting for software to be
>>        upgraded (something that would presumably happen more
>>        quickly in places where the character is important than
>>        in places where it isn't).
>> 
>> It is that area of flexibility with CONTEXT, especially
>> CONTEXT-OTHER, where my view that "Disallowed" is permanent,
>> with no path (or a very difficult one) out of that category,
>> converges with what I understand of Mark's desire to make
>> migration out of DISALLOWED relatively easy.  In the middle
>> ground, we try to identify the characters about which we may
>> be uncertain and identity them as CONTEXTO with no
>> expectation of assigning rules unless it turns out that they
>> are really needed. That approach assume that we can
>> anticipate characters that _might_ need to be moved, i.e.,
>> characters about which are are not certain that DISALLOWED is
>> globally correct.  I think that is probably correct.  Indeed,
>> I believe that, if it is not correct, this entire approach is
>> built on a house of cards and we may need to drop it.
>> 
>> And, FWIW, the argument for putting Cf into CONTEXTO precisely
>> follows the reasoning above -- these odd and
>> sometimes-invisible cases (see U+2060, 2062..2064; WORD
>> JOINER, INVISIBLE TIMES/ SEPARATOR/ PLUS) are precisely the
>> sorts of thing that someone might, conceivably, argue
>> passionately are required in some IDN contexts.   If I
>> correctly understand the use of these characters, my own view
>> is that I would argue strongly about permitting them.  But I
>> think it would be better to have that argument on the basis
>> of substantive requirement to have the characters in IDNs
>> versus risks and complexity and not on the basis of an
>> artifact of how we had defined things.
>> 
>>     john
>> 
>> 
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>> 
>> 
>> 
>> -- 
>> Mark
> 






More information about the Idna-update mailing list