Q3: What characters should be allowed in a revised IDNA2008 specification?

John C Klensin klensin at jck.com
Wed Apr 1 07:59:58 CEST 2009



--On Tuesday, March 31, 2009 12:14 -0400 Vint Cerf
<vint at google.com> wrote:

> IDNA2008 currently allows a more restricted set of characters
> to be   used in domain name labels than IDNA2003 does.
> 
> Does the working group agree that the more restricted set of
> the   current IDNA2008 Tables document should apply once
> IDNA2008 is   adopted?

I believe that this question has been answered in the
affirmative several times and that, in the absence of strong new
evidence or arguments, we should not need to revisit it again.

> What should be done about registrations
> that use characters   that would not be allowed under
> IDNA2008? 

These registrations have always violated established advice --
from the IESG as well as from ICANN and others-- against
registering labels containing characters that are not used to
write the words of at least some language.  Long-term support
for  them simply encourages more such registrations, some of
which can be problematic (punctuation or symbol characters that
appear to be slashes or non-label-separating dots being the most
obvious examples).  

>  Should there be a   transitional period of finite
> duration after which these registrations   will become
> invalid? 

The period in which IDNA2003 lookup implementations are
gradually replaced by IDNA2008 ones should provide a more than
adequate transition period without taking any special measures.
Registries and zones that have created and installed such labels
should certainly work out transition strategies, but the exact
nature of those strategies is beyond the scope of this WG.

> Should they be grandfathered somehow? If we   believe
> all future registrations should be restricted, how would such  
> grandfathered registrations be found if the IDNA2008 rules
> would   reject lookups of the disallowed characters?

That is exactly the problem.  If these strings are grandfathered
and guaranteed to be looked up, then we would effectively have
to abandon substantially all lookup-time checking.  One could
argue that even code points that were UNASSIGNED at the time of
IDNA2003 (i.e., in Unicode 3.2) would have to be looked up
because it is not clear that a registry installing a label that
uses a code point that first appeared in, e.g., Unicode 4.0 is a
more severe violation of the IDNA2003 standard and associated
registration recommendations than simply installing a label that
contains symbol or punctuation characters.

> A two-lookup scheme might solve this problem:
> 
> 1. lookup according to IDNA2008 rules (if disallowed
> characters are   present, go to step 2); if domain name record
> is found, return the   information. If not, go to step 2
> 2. lookup according to IDNA2003 rules (permitting a broader
> range of   characters in the lookup process). If domain record
> is found, return   it, if not return "no such domain name"

If continued for any length of time, this approach (which
appears to be equivalent to the one I suggested in the Appendix
to Protocol-11 without fully understanding its implications)
would effectively redefine all characters that are present in
Nameprep/Stringprep as PVALID, even if IDNA2008 had intended to
DISALLOW them.

It seems to me that, if we are going to perform any sort of
compatibility mapping, we will need to create a new
Stringprep-like table by filtering out any mappings whose target
is a DISALLOWED or CONTEXT character under in the IDNA2008 rules
and then use a table no larger than that one for the mapping
function.





More information about the Idna-update mailing list