Minimal IDNAbis requirements

John C Klensin klensin at jck.com
Sun Dec 23 15:12:22 CET 2007



--On Friday, 21 December, 2007 18:35 -0800 Erik van der Poel
<erikv at google.com> wrote:

> John,
> 
> I agree that this was a useful exercise, and that it takes us
> pretty close to where we are today with IDNA200X. I think it
> might also be useful, not only for ourselves but also for
> future readers, to do the opposite exercise, i.e. starting
> from the rules that Patrik, Ken and Mark are writing, try to
> come up with a rationale for each rule.

At least for Patrik's list, I thought we had done that in
"issues".  If you don't think so, please tell us what needs to
be explained better.

> I believe that the mathematical slashes (division/fraction
> signs) were the ones that really got us to think about
> disallowing symbols and punctuation, but that example is not
> sufficient to explain the whole set away. Instead, we might
> say that the initial set of allowed characters is purposely
> being limited to a "small" set in the interests of being
> conservative, which is important in networking, both at the
> machine level and at the human level.

Actually, no.  Remember that the original LDH rule only
permitted one symbol (hyphen-minus) or two if one counts the
label separator.  That decision was made many years ago for one
of the key reasons we are concerned about symbols today.  They
have poor interchange properties and worse properties for
transfer (in any direction) between oral, handwritten, printed,
and computer-encoded forms. Particular risks in the IDN case
reinforced our view that we should be generalizing from the
principles associated with the LDH rule rather than trying to
see which of those characters was the most problematic.   And,
of course, that aspect of the LDH rule was driven, in part, by
exactly the sort of conservatism that you cite.  And I thought
that the most recent version of "issues" said that.

> We might even want to think of NEVER as being "probably never",
> without actually writing that down. Future generations may
> well find ways to introduce some of the symbols or even
> punctuation into IDNs without causing any real harm.

The difficulty with "probably never" is in implementation.  The
current documents assume that the NEVER list will be checked by
applications that look things up in the DNS, such that labels
containing those characters will never be looked up.  That is
important precisely to prevent unscrupulous registries from
entering obviously-dangerous strings into the DNS (think of "/"
as an example here, but only one such example).  If the NEVER
list is checked at lookup time, such strings will never be found.

Of course, one could argue from that to the conclusion that the
only characters on the NEVER list are those that can be proven
dangerous on a character by character basis.   But I think we
have ample reason to believe that we should not go there.

Once applications are rejecting the NEVER list and refusing to
look up labels containing any of those characters, we just do
not see a way to accept them later without creating a situation
in which newer application implementations accept (and look up)
characters that older ones reject, creating a
high-unpredictability situation for users.

> By the way, I'm guessing from your lack of response to my
> suggestion of a new term "V-label" for variants as supported
> in browsers today, that you'd rather not legitimize these
> things by giving them an official name. Fair enough. But it
> still would be good to get the UTF-8 SMTP drafts to explicitly
> say whether U-labels are required, and whether non-ASCII dots
> are allowed.

Please don't reach that conclusion about anything to which I
have not responded in the last week, or even the last month.
I've been on almost continual travel (returned from Asia less
than 12 hours ago) and simply have not been able to read every
message in these threads carefully.  I hope to be able to catch
up in the next week or two.  So far, I haven't read anything
about your "V-label" suggestion other than what is above.  My
instinct from the above is that this doesn't belong in the
protocol documents, but might be very useful in the "advice to
UI implementers" piece that we've discussed.  But I may not
understand the proposal (yet).

   best,
    john




More information about the Idna-update mailing list