Disallowing code points

Fri Jul 17 01:48:55 CEST 2009

Vint,

I fully support this approach, what I want to point out though, is that barring the joiner context rules, no other context rules are applied at lookup (and I am not saying they should be). Any 'registry' at any level of the DNS hierarchy who, either deliberately or through lack of acting diligently, does not apply a context rule(s) will still be manifesting the problem the context rule was designed to address, as clients will still lookup the names!

So I still don't fully understand the point of context rules, unless they are just going to act as a guide?

Chris

-----Original Message-----
From: Vint Cerf [mailto:vint at google.com] 
Sent: Thursday, 16 July 2009 9:42 PM
To: Chris Wright
Cc: idna-update at alvestrand.no
Subject: Re: Disallowing code points

Chris,

generally the registries have the last word on what characters they  
permit (from the PVALID set defined by the protocol).

The working group has been trying hard to work by inclusion, building  
from a basic set of characters deemed needed for scripts but excluding  
punctuation and symbols and so on.

There are a few cases that don't easily fall into categories already  
identified in the Unicode property system and we have had to deal with  
them individually either by specific exclusion or by special context  
handling. We hope to minimize this but I think we are moving well  
towards a consensus on special cases. The problem we foresee is that  
not all registry operators (remember this is not just at top or second  
level but all through the hierarchy) will show the same diligence. So  
the most troubling cases can be excluded by protocol.

Vint

On Jul 16, 2009, at 2:58 AM, Chris Wright wrote:

> There have been some discussions lately about explicitly disallowing  
> specific code points
>
> I believe that the decisions to disallow specific code points should  
> be kept to an absolute minimum, with strong technically justifiable  
> reasons being required for specific singling out of code points.  
> Again these are in the end 'policy' based decisions that have the  
> potential to impact languages not yet even considered. Registries  
> are required by ICANN/IANA to identify the list of 'characters' that  
> they will allow to be used for registration of domain names in each  
> particular language they would like to support. As per my previous  
> post, Registries are not in the business of doing things to  
> jeopardised the security and stability of their own namespaces. We  
> have the rule system, that is Unicode independent for determining  
> the protocol status of each code point and barring the exceptions,  
> this should be sufficient. If we must make comment on other code  
> points, as per context rules, we should discuss these in the context  
> of Best Common Practices and set forth recommendations for
>  registries to follow about which code points that they should  
> consider 'dangerous' and not allow in the definition of their  
> languages.
>
> If we take a single code point like Tatweel, for example, and argue  
> that it's not required to be used anymore and thus should be  
> disallowed, then why not take the whole class of 'dead' languages  
> and disallow those? I have to ask what is the harm in keeping those  
> code points PVALID? At least this is the most flexible approach  
> going forward, and doesn't force us to make decisions now that we  
> may not necessarily have all the information about.  We can publish  
> a BCP discussing the issues with specific code points and educate  
> the registries as to the right thing to do.
>
> Thanks
>
> Chris Wright
> Chief Technology Officer
> AusRegistry Pty Ltd
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update