The Two Lookups Approach (was Re: Parsing the issuesand finding a middle ground -- another attempt)

John C Klensin klensin at jck.com
Sat Mar 7 21:30:00 CET 2009


One caveat about this, independent of most of the current
thread.  I tried to reflect the model for discussion in Appendix
1 of the recently-posted Protocol-10.   I wrote too hastily and
while too tired and botched the definition.   I suggest either
reading the text for general principles rather than the specific
mechanism or waiting for the revised version (which will appear
before the cutoff).

An almost-similar comment applies to the workaround for using
"A-label" and "U-label" on lookup.  The lookup checks don't
ensure that all of the criteria for {registration) A-labels and
U-labels are met, so the unqualified use of those terms on
lookup is not strictly correct (as Mark has pointed out several
times).  The original approach to that problem (many drafts ago
and before the problem existed, much less was pointed out) was
to describe labels that has not satisfied all tests as "putative
A-labels" and/or "putative U-labels" -- strings that looked more
or less like those label forms and that were claimed, by context
and form, to be them, but that had not yet passed all of the
relevant tests.   Some participants in the WG objected strongly
to "putative", so I eliminated it in several contexts, resulting
(or reinforcing) the definitional problem in Lookup.  In
Protocol-10, I tried using "apparent U-label".  That doesn't
quite work either, so -11 will use a different approach,
eliminating that terminology entirely from the lookup side (but
resulting in slightly more convoluted text).

Mark suggested a different model, which was to introduce
"C-label" as a term for the superset of A-labels that met the
lookup criteria and restrictions but not necessarily all of the
A-label criteria. I liked that idea (due to an editing error,
Protocol-10 contains a vestige of my trying to fit it in).  But,
in the process of working on the text, I realized that we
actually have many categories here and that a terminology
solution would require introducing terms for more than just one
more of them.  In particular, there appear to be:

	* strings in IDNA-aware slots that no one has looked at
	yet.
	* strings in such slots that contain non-ASCII
	characters but that have not yet been subjected to any
	of the validation tests for U-labels.
	* strings of that variety that have been passed some of
	the validation tests for U-labels, but not even enough
	to be valid for Punycode conversion and lookup.
	* strings of that variety that have passed all of the
	validation tests needed for Punycode conversion and
	lookup, but not the additional tests (CONTEXTO, Bidi)
	required for U-labels
	* U-labels

and, similarly, 

	* strings in IDNA-aware slots that start in "xn--" (case
	independent) but have not otherwise been subjected to
	any of the validation tests for A-labels.
	* strings of that variety that have passed some of the
	validation tests for A-labels, but not sufficient of
	them to determine that they are valid for looking up in
	the DNS.
	* strings that have been determined to be valid for
	looking up in the DNS but that have not been checked for
	the additional criteria needed to qualify them as
	A-labels.
	* A-labels

While I'm happy to change things if the WG prefers (and comes up
with appropriate terms and definitions), my editorial judgment
is that trying to solve the problem of partially-checked strings
by adding terminology to identify different of the nine
subcategories above is more likely to confuse the reader than to
help with understanding or implementations (even though it would
arguably improve precision).

As suggested above, protocol-10 tried to address the issue by
talking about "apparent" U-labels.  Only after I was about ready
to post it did I realize that returned us to "putative" in
slightly different clothing (and that is noted in the draft).
Protocol-11 eliminates the problematic text by going back to the
convoluted description form that Patrik criticized earlier in
the context of the U-label terminology.  I don't know if that is
the best long-term solution, but, if it is not, I need help from
the WG in figuring out a better one.

     john



More information about the Idna-update mailing list