Conformance for intermediary modules
mark.davis at icu-project.org
Tue Mar 11 22:01:30 CET 2008
One other issue connected (vaguely) with UNASSIGNED (but I changed the subject):
Modern programs are often a set of interoperating modules. That is,
suppose that I have a module M that simply takes a domain name in
A-Label form, and does a DNS lookup. Is it required to backconvert to
a U-Label and do the checks, such as for UNASSIGNED? Or can it trust
that it is being handed a valid label? How would the conformance
requirements of protocol/tables/bidi apply to such a case? And would
it be different from the case of a browser getting the value of an
href? Or the browser getting a value from the address bar?
BTW, I haven't seen a reply to the following message. John, do you
have a "hip-pocket" counterexample?
On Fri, Mar 7, 2008 at 4:39 PM, Mark Davis <mark.davis at icu-project.org> wrote:
> > My conclusions are:
> > (1) Looking up unregistered code points is untenable because it
> > makes moving to future versions of Unicode impossible. That
> > conclusion is already reflected in IDNA200X, but IDNA2003
> > requires such lookups.
> I disagree. While I'm willing to live with the John, Harald, and Patrik's
> decision to disallow the resolution with unassigned characters -- just so we
> can get this thing out the door -- we should not be basing any other
> decisions on thinking that it is "untenable".
> Consider an character X that was unassigned in Unicode 5.1, but assigned in
> Unicode 6.0, and see what happens. Let's suppose that a U5.1 client sends
> out "aXc.com" ("a" and "c" are some particular strings, not the literal
> U+0061 and U+0063). Before the registry upgrades to U6.0, it will fail, as
> expected -- it wasn't (and couldn't have been) registered.
> So let's look at the case where the registry has upgraded to U6.0. There are
> a small number of cases, and I don't see that *any* of them cause a problem.
> X is illegal according to IDNA200X rules under U6.0. The registry can't
> register it, so it won't work. Not a problem.
> X is legal and unaffected by normalization. This is true of the vast
> majority of characters. Then if the registry adds "aXc.com", then the old
> client will work, as expected. Not a problem -- in fact, a positive benefit.
> X is legal but affected by normalization -- but not in the context of
> "a...c". This is true of the vast majority of those few characters remaining
> from case #2. Then if the registry adds "aXc.com", then the old resolver
> will work. Not a problem -- in fact, a positive benefit.
> X is legal, and affected by normalization, in the context of "a...c". For
> example, suppose that string a ends with a non-spacing mark that reorders
> with X in NFC. In that case, "aXc.com" would not be legal, and could not be
> registered. So even in this rare case, not a problem.
> John, if you think this situation is untenable, which of the above cases
> causes a problem, and exactly what would that problem be?
More information about the Idna-update