Comments on IDNA Bidi
mark.davis at icu-project.org
Tue Jan 15 02:19:48 CET 2008
The paragraph in question originated in Harald's document:
> One particular example of the last case is if a program chooses to
> examine the last character (in network order) of a string in order to
> determine its directionality, rather than its first; if it finds an
> NSM character and tries to display the string as if it was a left-to-
> right string, the resulting display may be interesting, but not
I was speaking loosely of a URL when I should have said IRI. I was operating
on the same level as Harald's original text, which is a Unicode character
level, not a Punycode level. So replace what I said by IRI. Sorry for the
Harald's text must also have been referring to IRI as well, since NSMs don't
occur in URLs.
So much of what you wrote was directed at something that I didn't mean, and
I'll skip over that. There are a few parts I'll comment on below.
On Jan 13, 2008 11:20 AM, John C Klensin <klensin at jck.com> wrote:
> Certainly, as you, Martin, Erik, and others have pointed out in
> various ways, there are many places in which strings appear that
> look like URLs and don't conform to URL rules. It may be
> perfectly reasonable in some contexts to have a string that
> looks like a URL but that contains non-ASCII characters. But,
> unless it is an IRI in a context in which IRIs are permitted,
> one gets from such a string to a URL via exactly the sort of
> preprocessing that we've been discussing as "user agent"
> functionality in the IDNAbis context.
I don't think it's as simple as calling it a "UI" context. Using the term
"preprocessing" step (as you do below) is clearer. For more, see below.
It is also possible that I misunderstand what you mean by
> "assume". Neither an implementation of IDNA2003 nor an
> implementation of IDNA200X is conformant with the intent of
> those specifications if it "assumes" any of these things and
> then goes off and behaves as if they are true. In both cases,
> implementations are expected to test the strings they intend to
> pass (or intend others to pass) to the DNS so that
> non-conforming strings will fail. In IDNA2003, most of the
> testing is built into ToASCII and the operations surrounding it.
> In IDNA200X, much of the testing is more explicit. But neither
> assumes things that it doesn't verify.
I think we may agree on this. Part of my confusion with Harald's original
text presumed that we had an implementation that made a (false) presumption
by assuming that IDNAs were necessarily IDNAbis -- so a change would cause a
problem for some implementation.
Clearly, there is at least other issue. It arises for names
> that are valid under IDNA200X but not obviously valid under
> IDNA2003. An IDNA2003 lookup implementation will reject some of
> them as invalid (some or most of those that merely contain
> codepoints that are unassigned in Unicode 3.2 but assigned in
> later versions may slip through). In the long term, the only
> way to make all of the newly-available characters and strings
> available to IDN-using applications is for implementations of
> those applications to upgrade. That would be true of any update
> to IDNA that moves beyond Unicode 3.2, especially since
> registration of strings that contain codepoints that are are
> unassigned at registration time is, fairly obviously, the worst
> of bad practices.
I foresee an indefinitely long period in which many programs like browsers,
emailers, etc would need to handle both IDNA2003 and IDNAbis.
Now I'm being a little pedantic here, for which I apologize, but
> I think the point is important. If any of the majority of the
> cases you list above, what the strings occur in is not a URL,
> but something that must be transformed into a URL.
> Now I'm going to make two assumptions with which you may
> disagree. The first is that the IDNA200X model is sufficiently
> different from the IDNA2003 one that few, if any, applications
> are going to switch (or be able or inclined to switch) from
> IDNA2003 to IDNA200X by a completely automatic process without
> anyone thinking about it or noticing.
I would disagree a bit with the first one. Many programs (such as those in
my company, Google) will need to handle both, for an indefinite period. I
expect what we will probably do is to
- See if it works under IDNA2003. If so, fine
- Otherwise see if it works under IDNAbis, if so, fine
- See if the major browsers accept it anyway, if so, we'll need to
take it anyway.
Take a look at the following table, for example:
Link Firefox IE7 0 <a href="http://bücher.de <http://b%C3%BCcher.de/>">
works works 1 <a href="http://Bücher.de <http://b%C3%BCcher.de/>"> works
works 2 <a href="http://xn--bcher-kva.de <http://b%C3%BCcher.de/>"> works
works 3 <a href="http://B%C3%BCcher.de <http://b%c3%bccher.de/>"> doesn't
Because Firefox and IE7 both accept (0), (1), and (2), I can't see any way
around Google's handling them also. This is into the indefinite future, even
though #0 and #1 are not in Punycode. And this is not a UI issue; these are
in the HTML page. That's why "preprocessing" is a better phrase than "UI".
The more of the web and net's infrastructure that accepts these variations,
the more that other programs need to accommodate them, so that they
interwork with one another.
What I really don't want to see is an IDNAbis that fails to gain traction
because of this (thinking back to XML 1.1, which failed to gain traction
because of a really rather small incompatibility with XML 1.0).
> The second assumption is that any implementation that now
> depends upon, or offers to users, the input flexibilities of
> IDNA2003 (some applications of IDNA2003 do not) would be stupid
> to implement IDNA200X in a way that simply drops those
> flexibilities. Whether it should quietly retain them, or
> produce more or less subtle warnings to users about the
> conversions becomes a local design matter (and programs that
> communicate with users obviously have choices that are not
> available to ones do not), it appears to me that we are already
> heading in the direction of applications (and, if that approach
> isn't stopped for other reasons, "smart domain name servers")
> making decisions about some things being safer than others and
> conditioning their actions on those decisions.
I think we're in agreement here.
> The rationale document doesn't cover that situation nearly well
> enough at -05, but there is a new section and extensive text
> about it in the working version of -06. I don't think anything
> there will come as a surprise, since all of the issues have been
> discussed on this list and much of the text is derived from
> discussions on the list.
> > What I'm saying is that essentially all of the incompatible
> > differences between 2003 and the current bis are potential
> > problems for some implementation, and once we get done with
> > bis, we will need to list them all. So just calling out #5 is
> > insufficient.
> While our perspective on these "incompatible differences" is
> quite different, I hope that the new text in issues-06 will
> address many of your concerns.
Looking forward to it.
> But it is also true that many of
> those differences are differences in how and when IDNA is
> applied that are simply not defined by the original protocol or
> are differences that are important only if applicability
> principles or guidelines about the use of the original protocol
> were violated. If adjustments in those areas are impossible,
> then we are in very difficult waters indeed.
Yes, I think we may need to be pragmatic about the changes that we
introduce, because of the established conventions...
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update