SASLprep200x

Mark Davis mark.davis at icu-project.org
Thu Jan 11 02:07:27 CET 2007


I think Patrik's message has engendered a very useful discussion. I have no
objection to discussing the notion of levels, since looking at the issue
from different perspectives helps to pinpoint areas where we may have
different views that need to be reconciled. For example, this was useful to
help to emphasize that a profile of stringprep doesn't need to be a subset,
and that we are not just discussing sets of code points, but restrictions on
strings.

Here's a quick recast of Patrik's levels.

On 1/10/07, Kenneth Whistler <kenw at sybase.com> wrote:
>
> Patrik said:
>
> > My view, that was sort of invented during a talk with Cary the other
> > week, is that we can see it like this:
> >
> >   1 We have a set of theoretical codepoints. 0x0000 and up.


The Unicode architecture provides a set of abstract code points from U+0000
to U+10FFFF, which is fixed across all Unicode versions.

>   2 Unicode Character Set include for each version a subset of those
> > codepoints.


In each version of Unicode, some subset of those codepoints are assigned to
characters (whereby each version only adds assigned characters, never
removes or moves them). These characters have a number of properties; the
Unicode consortium can add new properties where necessary, and can stabilize
certain properties where necessary.

>   3 Stringprep allow a subset of the Unicode Character Set.


Stringprep allows a subset of those assigned characters in output strings.
It may also have string-based limitations, ones that restrict certain
combinations of characters (such as the BIDI restrictions or the proposed
ZWJ/NJ conditions)

>   4 A profile of stringprep (like Nameprep) is a subset of stringprep
> > allowed "stuff".


Nameprep allows a subset of the output strings from StringPrep (But as Erik
says, this is not true of any profile, since a profile can add or remove. So
while NamePrep may use a proper subset of the output strings allowed in
StringPrep, it is not a requirement on all profiles.)

>   5 Registry policy talk about a subset of Nameprep.


A Registry policy may further restrict the output strings allowed by
NamePrep, either as a subset of characters or as string-based limitations.

>   6 Registrar policy talk about a subset of the registry policy.


A Registrar policy may further restrict the output strings allowed by the
Registry, either as a subset of characters or as string-based limitations.

>   7 User interface issues is a subset of the registrar policy (might
> > at least create a subset).


A user-agent may signal information about URLs to users.
[I don't think we want to actually recommend that the UI restrict output
strings; what we would recommend is that the UI signal information about the
URLs, such as displaying URLs with their PunyCode forms where it would be
useful to caution the user, eg when they use scripts outside of what the
user's settings allow. A UI may also perform certain processing on input,
such as transforming input of isolated Jamo characters into Hangul
syllables; that would be outside of the scope of IDNAbis.]

I do think that it is valuable to keep this entire picture in mind. While we
are really focused on #3 and #4, certainly the context of #1-2 (a growing
repertoire of Unicode characters), and the context of #5-7 (the abilities of
registries to further refine limitiations, and for user-agents to provide
more information about URLs) is important to keep in mind as we go forward.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070110/b08c65d5/attachment.html


More information about the Idna-update mailing list