Completely off-topic: what about legacy UTF-8 DNS and IDNA2003?

John C Klensin klensin at jck.com
Mon Mar 2 22:18:12 CET 2009


Let me add two things, and a plea, to Andrew's excellent
summary...

--On Monday, March 02, 2009 15:10 -0500 Andrew Sullivan
<ajs at shinkuro.com> wrote:

> On Mon, Mar 02, 2009 at 11:48:49AM -0800, Shawn Steele (???)
> wrote:
> 
>> I've had some questions asked about how punycode names and
>> UTF-8 names should interoperate in environments where there's
>> a history of UTF-8 DNS.  (Yea, I know it'll take a bajillion
>...
 
> As a matter of protocol, the DNS was never 7 bit.  It is
> supposed to be 8 bit, but 1034/1035 note that there are other
> restrictions on network names that ought to be taken into
> consideration.  As a result, some people treated the DNS as
> effectively 7 bit, and that's roughly how we ended up with the
> LDH rule.  (I have grossly oversimplified this.  I think John
> put up somewhere a fairly lengthy discussion of the history.)

Yes, but I don't remember where easily enough for a pointer to
it.  The important thing for this discussion is that there
are/were two other issues in the development of the LDH rule.
One is that it derives very directly from the pre-DNS rules
about host names and the host table, both of which were very
explicitly ASCII.    The DNS transition plan necessarily moved
those names and the associated rules forward, even though many
of them have evolved over time.   My memory/reading of that
history is different from that of Mark Andrews but, on this
issue, we ended up in just about the same place.

The second was that the DNS came along well before Unicode, at a
time when various national "8 bit ASCII" proposals were coming
along (I think before 8859 settled in, but that would be easy to
check if anyone cared) and when code-page switching schemes
(based on ISO 2022 and otherwise) were alive and well.   The
result was that we had clear rules for ASCII (and the DNS
clearly assumes that any octet with the 8th bit off is ASCII),
including the case-matching rules that were also carried forward
from the earlier host name specs, but no way to write rules at
all for strings with the 8th bit turned on.  For those
characters, there was no way to tell which CCS or code page they
were drawn from, much less any way to solve the "what matches
and what doesn't" problems that have been a concern on this list
lately.    So we have the rather odd situation in which
characters with the 8th bit off are matched in a
case-insensitive way (although case is preserved throughout DNS
operations) and octets with the 8th bit on are not, even when
the case relationships might be obvious (in the right CCS).
Those issues are discussed in more detail in RFC 4343.

Systems with more than one octet per character, whether UTF-8 or
something else, of course make that situation even more
difficult.  If we were really to move toward full UTF-8 support
in the DNS and 4343 reflects consensus in the DNS community (I
have no reason to believe that it does not), we would probably
need both a new label type and a new class to fully support what
people would expect (and, again, we would still have serious
problems getting global consensus on what matched and what
didn't, at least without defining RR types that would
accommodate and require language information... the difficulties
with which I've tried to explain earlier).

>...

Many of us who have worked on IDNs for a long time have
recognized all along that they are never going to be a solution
to the broad range of language-based issues, partially because
the DNS never will, regardless of how IDNs are supported.  To do
everything really well, one needs slightly-fuzzy matching,
display hints, localization on lookup, and probably elimination
of DNS requirements for a strictly hierarchical system and
left-to-right label ordering.  There are ways to do all of those
things, but most of them lie "above" or outside the DNS, not
trying to change the DNS to do the job.

The boundary --how far the DNS can or should be altered to
better support an internationalized world -- is a subject that
can be debated endlessly, both in the context of "how to fit it
in" (new label types and/or classes, trick RRs, etc.) and of
"time to design and deploy DNS-II".  Andrew, Mark, and a few
others notwithstanding, this WG doesn't have the expertise
required to sensibly have those discussions (independent of how
far out of scope they are).

The plea (not a demand or requirement -- I don't have the
authority and wouldn't exercise it if I did):

I would like to have the documents in as good shape as possible
before San Francisco.  In order to do that, I'm trying to absorb
relevant comments and suggestions.    But I'm also trying to
follow the list, just in case something comes up that ought to
be reflected in the documents.  I'm also hoping that people will
actually read and comment on the details and text of the posted
versions (either the IDNA2008 ones or even Paul's alternative).
It would really help with that if the notes that involve
speculation about DNS changes, etc., just stopped for a while or
were taken elsewhere.

     john



More information about the Idna-update mailing list