Changing DISALLOWED (was Re: Reserved general punctuation)

Mark Davis mark.davis at icu-project.org
Thu May 1 04:17:56 CEST 2008


I think the following misrepresents my position:
 "It is that area of flexibility with CONTEXT, especially
CONTEXT-OTHER, where my view that "Disallowed" is permanent,
with no path (or a very difficult one) out of that category,
converges with what I understand of Mark's desire to make
migration out of DISALLOWED relatively easy."

I'm not looking to make it easy. I think there are a few possible positions
we could take in IDNAbis.

1. We say that once DISALLOWED, always DISALLOWED.

This is not a firm promise, because an obsoleting RFC could change it, but
would certainly set a very high bar.

2. We say that characters can only be removed from DISALLOWED by an
obsoleting RFC.

A slightly lower bar. While it could be changed, it would certainly be
difficult.

3. We say that characters can only be removed from DISALLOWED by the
committee/mechanism that controls CONTEXT/exceptions, and only in extremis.

This should, in my view, also be quite difficult; not quite to the same
level as an RFC, but carefully, with sufficient time for deliberation, with
solid consensus by a broad set of experts.

4. We say that characters can only be removed from DISALLOWED by the
committee/mechanism that controls CONTEXT/exceptions, and but that committee
is not designed to be conservative.

This, I think, would be a very bad choice. My presumption has always been
that the committee/mechanism that controls CONTEXT/exceptions should be
extremely conservative in its changes; that changes are only made very
carefully.

I think #3 would be the best, and #2 acceptable, while #1 and #4 are
extremes that could cause problems.

Mark

On Wed, Apr 30, 2008 at 6:02 AM, John C Klensin <klensin at jck.com> wrote:

>
>
> --On Wednesday, 30 April, 2008 04:16 -0700 Vint Cerf
> <vint at google.com> wrote:
>
> > My naïve assumption is that anything unassigned has the
> > potential to become assigned so we need to have a state in
> > which the code point is not allowed for current use but could
> > be permitted at a later time. Do we have the semantics to
> > accommodate that? V
>
> Short answer: No.  I presume that is why we are having this
> discussion.
>
> Longer answer:
>
> While we have concluded that the problems it would cause
> outweigh the advantages, these areas of uncertainty are a large
> part of what motivated having MAYBE categories.
>
> I think that putting anything into UNASSIGNED that isn't
> actually unassigned (i.e., given no code point assignment in the
> then-current version of Unicode) is looking for trouble.  As you
> point out, such code points have the potential to become
> assigned.  While one might make some educated guesses from the
> block context in which the code point is located, we can't
> predict, with 100% certainty, the properties that a code point
> will have if and when it is assigned in the future.
>
> So, for a code point that is actually assigned, I think we have
> only three choices:
>
>        * Allow it, as Protocol-Valid.  For general punctuation
>        this is, I hope obviously, not a good idea.
>
>        * Disallow it and assume that, if we discover we need it
>        enough later, we will do whatever drastic revisions or
>        disaster corrections are required.  Of course, that sets
>        a very high bar to ever allowing those characters, but
>        that may not be unreasonable.
>
>        * Assign it to "context required" but do not assign a
>        rule.   Under the current proposed model, that means
>        that it can neither be registered nor looked up.  On the
>        other hand, we could, in the future, allow it in the
>        cases where it is actually required by assigning an
>        appropriate rule and then waiting for software to be
>        upgraded (something that would presumably happen more
>        quickly in places where the character is important than
>        in places where it isn't).
>
> It is that area of flexibility with CONTEXT, especially
> CONTEXT-OTHER, where my view that "Disallowed" is permanent,
> with no path (or a very difficult one) out of that category,
> converges with what I understand of Mark's desire to make
> migration out of DISALLOWED relatively easy.  In the middle
> ground, we try to identify the characters about which we may be
> uncertain and identity them as CONTEXTO with no expectation of
> assigning rules unless it turns out that they are really needed.
> That approach assume that we can anticipate characters that
> _might_ need to be moved, i.e., characters about which are are
> not certain that DISALLOWED is globally correct.  I think that
> is probably correct.  Indeed, I believe that, if it is not
> correct, this entire approach is built on a house of cards and
> we may need to drop it.
>
> And, FWIW, the argument for putting Cf into CONTEXTO precisely
> follows the reasoning above -- these odd and sometimes-invisible
> cases (see U+2060, 2062..2064; WORD JOINER, INVISIBLE TIMES/
> SEPARATOR/ PLUS) are precisely the sorts of thing that someone
> might, conceivably, argue passionately are required in some IDN
> contexts.   If I correctly understand the use of these
> characters, my own view is that I would argue strongly about
> permitting them.  But I think it would be better to have that
> argument on the basis of substantive requirement to have the
> characters in IDNs versus risks and complexity and not on the
> basis of an artifact of how we had defined things.
>
>     john
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080430/4408c693/attachment-0001.html


More information about the Idna-update mailing list