The Two Lookups Approach (was Re: Parsing the issuesand finding a middle ground -- another attempt)

Sun Mar 8 05:33:32 CET 2009

--On Saturday, March 07, 2009 19:44 -0800 Erik van der Poel
<erikv at google.com> wrote:

> Hi John,
> 
> In general, I think I agree with you that it would be
> confusing to have names for each of the nine subcategories.
> However, in our discussions, some of us have come up with
> names for certain commonly discussed things (Martin's L-label
> for locally mapped labels, my V-label for variant (globally
> mapped) labels and Mark's C-label, as you say). I think we
> would need more consensus before any of this is adopted.
> G-label might be better than V-label, if it is to mean
> globally mapped label.

Of course, your term and Martins are additive with the list I
gave, bringing the total so far to 11 (Mark's "C-label" is a
string starting with "xn--" that has been fully validates for
lookup purposes and hence on my list of nine... I should have
identified it that way).

We could, of course, differentiate between
registration-validated U-label-ish things (U-labels today) and
lookup-valid U-label-ish things versus registration-validate
A-label-ish things (A labels) and lookup-validated A-label-ish
things (Mark's C-labels).  But those distinctions would be more
useful outside the Protocol spec than in it since the protocol
spec deals mostly with strings in various stages of being
validates.

> Another aspect of this, which I am not sure you've captured in
> your drafts, is the Unicode version in use in a particular
> U-label. For example, in a protocol involving U-labels, it
> might be good to specify what to do when a sender sends a
> U-label containing characters from a newer version of Unicode
> than the receiver has implemented.

Almost by definition, the receiver can't tell whether what the
sender has sent it valid in a newer version of Unicode or just
an attempt to cause whatever problems using an unassigned code
point causes.   If the receiver has enough information to know
that the character is actually supported in a particular later
version of Unicode, it would require a strange situation indeed
for it to not support that character (and that version of
Unicode).  So I'm not quite sure how one would identify the
situation you describe, much less why it should need special
terminology.

    john