Esszett, Final Sigma, ZWJ and ZWNJ

Tue Feb 24 03:41:02 CET 2009

--On Monday, February 23, 2009 16:59 -0800 Mark Davis
<mark at macchiato.com> wrote:

> I tend to agree with Andrew that *effectively* this is a
> change to "bits on the wire".
> 
> That is, under IDNA2003 both
> "τιςγλώσσες.com<http://xn--oxaekj2bcabb8h.com/>" and
> "τισγλώσσεσ.com <http://xn--oxaekj2bcabb8h.com/>",
> for example, go to the same location, while under IDNA2008,
> they go to different locations (unless special registry
> actions are taken that are outside the control of this group).
> 
> For example, in an HTML page posted on the web:
> 
>    href="τιςγλώσσες.com
> <http://xn--oxaekj2bcabb8h.com/>"
> 
> gets interpreted differently by an IDNA2003 browser than by an
> IDNA2008 browser.

Mark, Paul, Andrew,

Making a "wire" argument here opens up a topic that has far
broader scope than this WG and raises some questions about the
nature of IDNA (and, in the process, IRIs, URIs, and other
contexts in which domain names can appear.   One way to read
IDNA2003 and the URI spec is that there isn't any "wire": the
ToASCII and ToUnicode operations are carried out locally and the
only thing that goes onto the wire is the ACE form.  Those who
were around for the original IDN WG will probably recall that,
even after the IDNA concept became clear, there were proposals
to send Unicode strings to intermediate servers which would then
perform the ToASCII function, do some caching as an
optimization, perform the DNS lookup, and then return the
results of that lookup to the calling host.  Those proposals
were rather firmly rejected, but that scenario would clearly
have represented an "on the wire" situation.

If only A-labels go over the wire, then it is still possible to
have one IDNA client/implementation interpret a differently from
another.  And that is still a problem, but it is a somewhat
different problem -- internal to and at the application layer
and not in the global DNS.

In that model, IRIs are a common set of user interface
conventions, but do not become protocol elements.  Alternately,
one could take the position that the real and appropriate intent
of IRIs is to replace URIs for most practical purposes and that
they are appropriately protocol elements with non-ASCII strings
usable globally.  That has other implications of which IDNs are
merely a narrow case and not the most problematic one at that.

These different ways of looking at context has been a source of
confusion about the WG and the IDNA2008 work since the
beginning.   For example, on the registration side, most of the
registries at the top level require registration requests to
contain the putative A-label.  If they do that, and validate
those A-labels, the nature of the relationship between IDNA2003
and IDNA2008 looks a lot different from the way it looks to a
registry that accepts native character Unicode strings to which
they apply Stringprep.  Similarly, the compatibility problem
looks different to an application that puts only A-labels on the
wire and that does early conversions to that form -- reserving
native character forms for display and other user interface
activities-- than it does to one that tries to use the native
character form internally as much as possible.

I don't know how much this help with the immediate problem, but
it is important to understand that there are fundamental
differences in perspective and assumptions that color the
discussions.

    john