SASLprep200x
Patrik Fältström
patrik at frobbit.se
Wed Jan 10 07:56:20 CET 2007
On 5 jan 2007, at 02.49, Paul Hoffman wrote:
> Strongly disagree. What Simon is asking for (as few character
> prohibitions as possible to aid SASLprep) is inherently against the
> basis of the -bis effort, which is to start with a more limited
> defensible set. If StringprepBis has a significantly larger set of
> characters than NamePrepBis, NamePrepBis will become a convoluted
> set of subsetting rules. That does not serve the DNS community in
> the least.
The discussion on this list is about an update to IDNA. I have not
seen so far any conclusion that that implies a stringprep update that
in turn have the implications you talk about.
My view, that was sort of invented during a talk with Cary the other
week, is that we can see it like this:
1 We have a set of theoretical codepoints. 0x0000 and up.
2 Unicode Character Set include for each version a subset of those
codepoints.
3 Stringprep allow a subset of the Unicode Character Set.
4 A profile of stringprep (like Nameprep) is a subset of stringprep
allowed "stuff".
5 Registry policy talk about a subset of Nameprep.
6 Registrar policy talk about a subset of the registry policy.
7 User interface issues is a subset of the registrar policy (might
at least create a subset).
I really wanted it to be "7 layers" ;-)
What we talked about Cary and myself was further, what are global
policies and what are local ones. My view is that 1-4 are global
policies, without connection to "languages", while 5-7 can be "local
policy" and include connections to languages and whatever else.
Further, 1-4 do not reference individual codepoints, but instead
"classes of characters" or similar.
Because of this, a rule like "combination character ring above can
only exists after latin letter a" can only exists in rule set 5-7,
while "combination character of type foo can only be after base
character bar" can at least theoretically be also in rules 1-4 (in
reality 3-4).
Yes, we have seen codepoints that do have exactly the same properties
and all, but still have to have different rules somewhere in the
architecture I outlined above. This is a case where I think UTC
should have a more careful look at whether their definitions are
correct and/or whether they have to redo some classification and/or
add some more metadata to the codepoints.
If we in the IETF process discover that codepoints are to be treated
differently while UTC give exactly the same classification and all to
them, then something is broken.
Patrik
More information about the Idna-update
mailing list