Rationale problems

Sat Dec 6 15:57:52 CET 2008

--On Saturday, 06 December, 2008 08:10 +0100 Harald Tveit
Alvestrand <harald at alvestrand.no> wrote:

>  >I may be just banging my head against a brick wall here, but
> nobody has been >willing to step up to the plate to say that
> "this causes me problems because of >situation X". No concrete
> examples have been cited. And if you can't give even one
> >single example of this being a problem, then you *at least*
> should qualify it to >indicate that it is an opinion.
> 
> OK, I'll bang my head in the other side of the brick wall one
> more time.
> 
> IF a character is DISALLOWED, and IF clients check against
> DISALLOWED as the protocol now requires them to do before they
> allow lookup of a domain name...
> 
> THEN anyone who wants to use a previously DISALLOWED name has
> to:
> 1) Change the specification to change DISALLOWED to PVALID
> 2) Wait until all software that he wishes to have access his
> domain name is upgraded before he can fully utilize his domain
> name.
> 
> In the period of 2), there will be some people able to use his
> new domain name, and some people who can't use it. If one of
> the first sends the name to one of the second, they will see
> inconsistent behaviour: What works for one person won't work
> for the other.
> 
> If this isn't a concrete example of a problem, I don't know
> what is.

Harald,

While I completely agree with your analysis and conclusion,
reading through it has led me to what might be an insight about
why this keeps coming up (e.g., why it seems unclear to Mark).

One could make much the same argument about not looking up
UNASSIGNED characters.  When a new character is added to Unicode
whose properties would cause it to be PVALID, one has to wait
until 
all lookup software is updated before that character is reliably
available.

There is, however, a difference.  If something is DISALLOWED, an
explicit decision has been made, based on properties and maybe
other considerations, to disallow it.  There is, of course, a
possibility of getting that decisions wrong, but it is on the
same order of likelihood of other things we disregarded, or been
encouraged to disregard, on the basis that it so unlikely,
especially given the costs of changing our minds, that it will
"never happen".

The difference is that, with unassigned characters, we have no
guarantees about the properties the code point would have is
assigned to a character except what can be deduced from block
location.  We cannot know for sure that it won't require
contextual rules, that it will not decompose into some existing
character or set of characters under NFC, and so on.  In
principle, we can't even know whether it will have some
prohibited general property (such as being a symbol), although
block locations may provide a reliable hint about that.   So we
have almost no choice other than to ban them at lookup time to
be sure we do not need to make future incompatible changes.
That implies considerable delays between when Unicode adopts a
new character and it becomes fully useful, but I think we have
to live with it.  In the case of DISALLOWED characters, we don't
and shouldn't.

     john