ASCII- vs non-ASCII mappings (was: Punycode Mixed-case annotation)

Andrew Sullivan ajs at shinkuro.com
Mon Jun 29 20:03:47 CEST 2009


On Mon, Jun 29, 2009 at 07:21:22PM +0200, Marie-France Berny wrote:
> 2009/6/29 Andrew Sullivan <ajs at shinkuro.com>
> >
> > Please don't hijack this thread.
> 
> 
> ????

I mean that the thread was talking about one thing, and you have
introduced a different topic.  It appears you're doing so unwittingly,
but I want not to conflate these two topics.
 
> The mapping of lower-case non-ASCII characters with respect to upper-case
> > apparently-ASCII characters is _not_ the same question as the effects of
> > lower- and upper-case ASCII across the U-label/A-label boundary.
> 
> 
> I am sorry. I have not the slightest idea of what you are talking about. I
> read an attempt to come to a quick conclusion regarding punycode and where
> to carry mapping. Or am I wrong?

Wrong, I'm afraid.  The specific question was about ASCII characters
that _remain ASCII_ when using Punycode to transform the label.  So
for instance, in

    abcdé

and

    ABCDé

the 'abcd' and 'ABCD' parts are not, strictly speaking, touched by
Punycode.  Under IDNA2003 there's a simple answer for this, because of
the way it works.  Under IDNA2008, the earliest proposals did no
mapping at all, and we haven't settled what mapping if any will
happen.  Therefore, there is a question about what to do with these
particular cases.
 
> As far as I understand, there is one clarification missing. It is what do
> you define as "global" in here. Are French (and possibly Persian, and
> probably many others...) included?

Yes, in the sense that there is one giant domain name system under
which everything has to fit, because the whole system is a tree
structure with one root.  (I'll leave aside for the moment the
possibility of "alternate roots", since every actual example of that
is in fact just a change of the servers holding the "unique root", and
not a change to the principle that there is a spot where the namespace
starts.)  

If you mean, "Will it support French, Persian, English, Chinese,
Arabic, and any other language Unicode supports in ways that are
completely natural to the readers and writers of those languages?" the
answer is, "No, and that was never the goal."  As several people have
said several times, the goal is not to be able to write literature in
the DNS.  The goal is just to internationalize the DNS, subject to the
limitations of the existing DNS. 

One of those limitations turns out to be the (in my opinion
unfortunate) DNS property that it is case-preserving but
case-insensitive.  As a historical fact, ExAmPlE.org, example.org,
EXAMPLE.org, EXAMPLE.ORG, and example.ORG are all "equivalent" for the
matching rules.  On my interpretation, the DNS server ought to return
an answer to any of those queries with the name as it appears in the
zone file, but some do other things (such as return a pointer to the
question section, which means you get back the form as you asked it).

What you are asking is, I'm sure, a completely natural extension of
that principle in your view: you want école.fra to match ECOLE.FRA.
The problem is that this doesn't work the same way, because ecole.fra
and ECOLE.FRA also match each other, so now we have an ambiguous
combination.  And that's only in the case where you actually know the
label is "in French" -- already an extremely complicated problem,
since we don't have a universally agreed-upon authority as to what
language any given word is in.  (You can't learn it from the DNS
without either an additional query or special processing on the server
side, both of which rules are, as far as I understand, antirequisites
for the current work.)  

Note that, in some contexts in English, it would be very surprising
that case didn't matter.  If case were not important in English, then
we would have lost them some time ago (also, a signficant body of
poetic work would be affected).  This is not a battle between people
who speak English and whose every natural impulse is accommodated
vs. everyone else.  It's just a matter of finding the set of
compromises that will fit within the compromises that were already set
when the DNS became successful.

All of the above said, as far as I know the mapping document is still
open for comment.  If you know some way by which these mappings are
achievable, I'm sure everyone would love to hear them.

Best regards,

A

-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.


More information about the Idna-update mailing list