Changing DISALLOWED (was Re: Reserved general punctuation)

Vint Cerf vint at google.com
Thu May 1 11:25:48 CEST 2008


At the risk of prolonging this thread, I am assuming that DISALLOWED  
is a condition that makes sense only for an ASSIGNED character and  
that UNASSIGNED means the code point has not been assigned any  
meaning or character. This suggests that anything UNASSIGNED should  
be rejected at the protocol level (no registration... no lookup  
either?). Wouldn't this imply that a new revision of UNICODE that  
ASSIGNS a previously UNASSIGNED character may require  
reimplementation at protocol level of filtering since the previously  
UNASSIGNED code point now has properties that might allow it to be  
used in IDNs.

before I did myself deeper into a hole, is this a correct statement  
and do I have the vocabulary right?

I am persuaded that (3) below seems the most reasonable path but as  
always I am open to persuasion in other directions.

vint



On Apr 30, 2008, at 10:17 PM, Mark Davis wrote:

> I think the following misrepresents my position:
>  "It is that area of flexibility with CONTEXT, especially
> CONTEXT-OTHER, where my view that "Disallowed" is permanent,
> with no path (or a very difficult one) out of that category,
> converges with what I understand of Mark's desire to make
> migration out of DISALLOWED relatively easy."
>
> I'm not looking to make it easy. I think there are a few possible  
> positions we could take in IDNAbis.
>
> 1. We say that once DISALLOWED, always DISALLOWED.
>
> This is not a firm promise, because an obsoleting RFC could change  
> it, but would certainly set a very high bar.
>
> 2. We say that characters can only be removed from DISALLOWED by an  
> obsoleting RFC.
>
> A slightly lower bar. While it could be changed, it would certainly  
> be difficult.
>
> 3. We say that characters can only be removed from DISALLOWED by  
> the committee/mechanism that controls CONTEXT/exceptions, and only  
> in extremis.
>
> This should, in my view, also be quite difficult; not quite to the  
> same level as an RFC, but carefully, with sufficient time for  
> deliberation, with solid consensus by a broad set of experts.
>
> 4. We say that characters can only be removed from DISALLOWED by  
> the committee/mechanism that controls CONTEXT/exceptions, and but  
> that committee is not designed to be conservative.
>
> This, I think, would be a very bad choice. My presumption has  
> always been that the committee/mechanism that controls CONTEXT/ 
> exceptions should be extremely conservative in its changes; that  
> changes are only made very carefully.
>
> I think #3 would be the best, and #2 acceptable, while #1 and #4  
> are extremes that could cause problems.
>
> Mark
>
> On Wed, Apr 30, 2008 at 6:02 AM, John C Klensin <klensin at jck.com>  
> wrote:
>
>
> --On Wednesday, 30 April, 2008 04:16 -0700 Vint Cerf
> <vint at google.com> wrote:
>
> > My naïve assumption is that anything unassigned has the
> > potential to become assigned so we need to have a state in
> > which the code point is not allowed for current use but could
> > be permitted at a later time. Do we have the semantics to
> > accommodate that? V
>
> Short answer: No.  I presume that is why we are having this
> discussion.
>
> Longer answer:
>
> While we have concluded that the problems it would cause
> outweigh the advantages, these areas of uncertainty are a large
> part of what motivated having MAYBE categories.
>
> I think that putting anything into UNASSIGNED that isn't
> actually unassigned (i.e., given no code point assignment in the
> then-current version of Unicode) is looking for trouble.  As you
> point out, such code points have the potential to become
> assigned.  While one might make some educated guesses from the
> block context in which the code point is located, we can't
> predict, with 100% certainty, the properties that a code point
> will have if and when it is assigned in the future.
>
> So, for a code point that is actually assigned, I think we have
> only three choices:
>
>        * Allow it, as Protocol-Valid.  For general punctuation
>        this is, I hope obviously, not a good idea.
>
>        * Disallow it and assume that, if we discover we need it
>        enough later, we will do whatever drastic revisions or
>        disaster corrections are required.  Of course, that sets
>        a very high bar to ever allowing those characters, but
>        that may not be unreasonable.
>
>        * Assign it to "context required" but do not assign a
>        rule.   Under the current proposed model, that means
>        that it can neither be registered nor looked up.  On the
>        other hand, we could, in the future, allow it in the
>        cases where it is actually required by assigning an
>        appropriate rule and then waiting for software to be
>        upgraded (something that would presumably happen more
>        quickly in places where the character is important than
>        in places where it isn't).
>
> It is that area of flexibility with CONTEXT, especially
> CONTEXT-OTHER, where my view that "Disallowed" is permanent,
> with no path (or a very difficult one) out of that category,
> converges with what I understand of Mark's desire to make
> migration out of DISALLOWED relatively easy.  In the middle
> ground, we try to identify the characters about which we may be
> uncertain and identity them as CONTEXTO with no expectation of
> assigning rules unless it turns out that they are really needed.
> That approach assume that we can anticipate characters that
> _might_ need to be moved, i.e., characters about which are are
> not certain that DISALLOWED is globally correct.  I think that
> is probably correct.  Indeed, I believe that, if it is not
> correct, this entire approach is built on a house of cards and
> we may need to drop it.
>
> And, FWIW, the argument for putting Cf into CONTEXTO precisely
> follows the reasoning above -- these odd and sometimes-invisible
> cases (see U+2060, 2062..2064; WORD JOINER, INVISIBLE TIMES/
> SEPARATOR/ PLUS) are precisely the sorts of thing that someone
> might, conceivably, argue passionately are required in some IDN
> contexts.   If I correctly understand the use of these
> characters, my own view is that I would argue strongly about
> permitting them.  But I think it would be better to have that
> argument on the basis of substantive requirement to have the
> characters in IDNs versus risks and complexity and not on the
> basis of an artifact of how we had defined things.
>
>     john
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
> -- 
> Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080501/b91d0b7d/attachment-0001.html


More information about the Idna-update mailing list