Treatment of UNASSIGNED Characters in Unicode

Vint Cerf vint at google.com
Sun Dec 21 12:43:03 CET 2008




Mark,

the simplest reason I see for NOT permitting UNASSIGNED
characters to be included in lookup has to do with our
inability to assure compliance especially at lower levels of
the domain name space. Unscrupulous or merely incompetent or
inattentive registrars (by this I do not mean the ICANN
definition of "registrar" but rather any entity that places
domain names into zone files at any level in the system) might
use (ie register domain names with) unassigned characters in
an attempt to cause confusion or to use misleading
registrations for abusive purposes. By prohibiting the lookup
of UNASSIGNED characters, such abuses are blocked.

Since the complete property list for an unassigned code point is  
unknown, and
remains unknown until the code point is assigned, we can't know
whether that code point will

	-- turn out to be DISALLOWED (presumably because it is
	assigned to a symbol, punctuation, or a letter that
	decomposes under NFC to some other character)
	
	-- turn out to be something that requires contextual
	treatment (i.e., CONTEXTO or CONTEXTJ and which one),
	much less what the relevant rules would be.
	
	-- turn out to be PVALID.

In essence, permitting it to be looked up establishes a "PVALID
until proven otherwise" status, which therefore raises most (but
not quite all) of the issues associated with changing the status
of a character from PVALID to DISALLOWED. Something I think
most of us would not want to facilitate.


Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com






More information about the Idna-update mailing list