SASLprep200x

Wed Jan 10 07:56:20 CET 2007

On 5 jan 2007, at 02.49, Paul Hoffman wrote:

> Strongly disagree. What Simon is asking for (as few character  
> prohibitions as possible to aid SASLprep) is inherently against the  
> basis of the -bis effort, which is to start with a more limited  
> defensible set. If StringprepBis has a significantly larger set of  
> characters than NamePrepBis, NamePrepBis will become a convoluted  
> set of subsetting rules. That does not serve the DNS community in  
> the least.

The discussion on this list is about an update to IDNA. I have not  
seen so far any conclusion that that implies a stringprep update that  
in turn have the implications you talk about.

My view, that was sort of invented during a talk with Cary the other  
week, is that we can see it like this:

  1 We have a set of theoretical codepoints. 0x0000 and up.
  2 Unicode Character Set include for each version a subset of those  
codepoints.
  3 Stringprep allow a subset of the Unicode Character Set.
  4 A profile of stringprep (like Nameprep) is a subset of stringprep  
allowed "stuff".
  5 Registry policy talk about a subset of Nameprep.
  6 Registrar policy talk about a subset of the registry policy.
  7 User interface issues is a subset of the registrar policy (might  
at least create a subset).

I really wanted it to be "7 layers" ;-)

What we talked about Cary and myself was further, what are global  
policies and what are local ones. My view is that 1-4 are global  
policies, without connection to "languages", while 5-7 can be "local  
policy" and include connections to languages and whatever else.  
Further, 1-4 do not reference individual codepoints, but instead  
"classes of characters" or similar.

Because of this, a rule like "combination character ring above can  
only exists after latin letter a" can only exists in rule set 5-7,  
while "combination character of type foo can only be after base  
character bar" can at least theoretically be also in rules 1-4 (in  
reality 3-4).

Yes, we have seen codepoints that do have exactly the same properties  
and all, but still have to have different rules somewhere in the  
architecture I outlined above. This is a case where I think UTC  
should have a more careful look at whether their definitions are  
correct and/or whether they have to redo some classification and/or  
add some more metadata to the codepoints.

If we in the IETF process discover that codepoints are to be treated  
differently while UTC give exactly the same classification and all to  
them, then something is broken.

    Patrik