WG Review: Internationalized Domain Name (idn)

John C Klensin klensin at jck.com
Wed Mar 5 14:32:59 CET 2008



--On Tuesday, 04 March, 2008 17:05 -0800 Lisa Dusseault
<lisa at osafoundation.org> wrote:

>> I think you need to define what "resolution time" means here.
>> For better or worse, IDNs now appear in authority sections of
>> URIs and not all of those are resolved at all.  If what you
>> mean is "Separate requirements for valid IDNs in registration
>> contexts, in identifiers, and in relation to the wire format
>> of DNS", then I think  you need three categories.
> 
> That's quite possible.  Is that level of detail required in
> the   charter?  I don't think there's consensus pre-WG about
> how to make   requirements for IDNs in identifiers, but this
> is something a WG   could reasonably tackle within the context
> of this charter -- in fact   it's something that would be hard
> to decide how to approach before   having a WG.

My personal view is that this is a level of detail not needed in
the charter.  But it is also not a detail that I'd be upset
about seeing there if it would make others more comfortable.

Interestingly, a major piece of the non-registration,
non-resolution, use issue, including authority sections of URIs
and various security-related identifiers, has been causing a
great deal of discussion lately.  That discussion certainly will
carry forward into the WG-based effort.  Basically, RFC 3490
provides for a set of characters that are to be treated as
label-separators -- i.e., equivalent to ASCII dots-- in
IDN-aware contexts and actually converted to ASCII dot in
contexts that are not IDN-aware.  In part because they aren't
resolved and perhaps in part because of what motivated Ted's
"For better or worse...", those cases fall into a gray area.
They are neither IDN-aware nor not IDN-aware but perhaps
UTF8-aware or at least UTF8-tolerant.  For those uses
("identifiers" may nor may not correctly capture the set, but I
will use it below), there are two issues with the current text
and mechanism in RFC 3490:

	(i) If one retains the current approach, the list of
	characters to be treated as dot-equivalents is not
	correct.  While much of the recent discussion has
	focused on the omission of ARABIC FULL STOP, I think
	there is now ample evidence that, if one has a writing
	system that has a "sentence" construction with sentences
	separated by some character and the keyboards used with
	that writing system do not have ASCII period on them,
	then there will be pressure to treat that sentence
	separator as a dot.   And, unfortunately, the list is
	likely to grow as Unicode adds writing systems and
	scripts.
	
	(ii) Some of those identifier contexts may require the
	ability to parse domain names into labels (e.g., to
	convert from dot-separated form to length-string lists)
	even if they don't actually resolve the names.
	Expecting that an identifier mechanism that is merely
	UTF8-aware to somehow know what all of the possible
	dot-substitutes are (especially if that list changes
	over time) is probably profoundly unrealistic.  But, if
	the applications of such mechanisms cannot accurately,
	consistently, and interoperably parse domain names into
	labels, we are in big trouble indeed. 

I take the combination of these to indicate that we can't retain
the current approach, but perhaps the WG can find another
solution.

That example is worth raising here, rather than just as part of
the IDNAbis discussion, because it illustrates what I think is
an important point.  We can and should discuss what should be in
the charter.  We are doing that, although, as with many other
IETF Charters, we should be careful that nit-picking charter
language doesn't become as substitute for, or impediment to,
getting the actual work done.   However, I believe that the only
possible discussions are about how we revise IDNA2003 (the set
of documents associated with RFC 3490) and what the changes
should be.  Several comments in this discussion have suggested
that IDNA2003 is good enough and that no further work is needed.
This example with label-separators  (which we hadn't fully
understood even when RFC 4690 was completed well over a year
ago) conclusive demonstrates that some work is needed to
preserve interoperability in practice, so we must, I believe,
discuss what changes are to be made (a topic for a WG) rather
than whether or not to open IDNA2003 at all.

>>> The WG will work to ensure practical stability of the
>>> validity algorithms for IDNs (whether based on character
>>> properties or inclusion/exclusion lists).
>> 
>> This is ambiguous.  If this is meant to say that the WG can
>> decide after starting its work that it must abandon the
>> character properties design direction and go to
>> inclusion/exclusion lists, then the   statement
>> above giving design direction needs to be changed.  If this
>> is meant to say "backwards compatibility with X" what X is is
>> not clear here.
> 
> I think you're suggesting removing the parenthetical from the
> charter   sentence.  Question for others: does that lose
> something important?    If so how can that be made compatible
> with the design direction  that   the charter suggests the WG
> needs to verify?

I think it is there to be sure that the WG addresses that
tradeoff as one of the questions it needs to ask.   Because I
don't believe that question can be avoided (even though I feel
strongly about the answer if we don't want to have an IDNA
revision every 18 months), I don't think removing the
parenthetical remark would have any net effect.   But I also
believe that spending time deciding gets us fairly close to the
"nit-picking about charters instead of focusing on  the work"
point.

     john




More information about the Idna-update mailing list