ASCII- vs non-ASCII mappings (was: Punycode Mixed-case annotation)

Wil Tan dready at gmail.com
Tue Jun 30 12:24:23 CEST 2009


Thanks for the thoughtful explanation Andrew, I agree with every aspect of
it.
=wil

On Tue, Jun 30, 2009 at 4:03 AM, Andrew Sullivan <ajs at shinkuro.com> wrote:

> On Mon, Jun 29, 2009 at 07:21:22PM +0200, Marie-France Berny wrote:
> > 2009/6/29 Andrew Sullivan <ajs at shinkuro.com>
> > >
> > > Please don't hijack this thread.
> >
> >
> > ????
>
> I mean that the thread was talking about one thing, and you have
> introduced a different topic.  It appears you're doing so unwittingly,
> but I want not to conflate these two topics.
>
> > The mapping of lower-case non-ASCII characters with respect to upper-case
> > > apparently-ASCII characters is _not_ the same question as the effects
> of
> > > lower- and upper-case ASCII across the U-label/A-label boundary.
> >
> >
> > I am sorry. I have not the slightest idea of what you are talking about.
> I
> > read an attempt to come to a quick conclusion regarding punycode and
> where
> > to carry mapping. Or am I wrong?
>
> Wrong, I'm afraid.  The specific question was about ASCII characters
> that _remain ASCII_ when using Punycode to transform the label.  So
> for instance, in
>
>    abcdé
>
> and
>
>    ABCDé
>
> the 'abcd' and 'ABCD' parts are not, strictly speaking, touched by
> Punycode.  Under IDNA2003 there's a simple answer for this, because of
> the way it works.  Under IDNA2008, the earliest proposals did no
> mapping at all, and we haven't settled what mapping if any will
> happen.  Therefore, there is a question about what to do with these
> particular cases.
>
> > As far as I understand, there is one clarification missing. It is what do
> > you define as "global" in here. Are French (and possibly Persian, and
> > probably many others...) included?
>
> Yes, in the sense that there is one giant domain name system under
> which everything has to fit, because the whole system is a tree
> structure with one root.  (I'll leave aside for the moment the
> possibility of "alternate roots", since every actual example of that
> is in fact just a change of the servers holding the "unique root", and
> not a change to the principle that there is a spot where the namespace
> starts.)
>
> If you mean, "Will it support French, Persian, English, Chinese,
> Arabic, and any other language Unicode supports in ways that are
> completely natural to the readers and writers of those languages?" the
> answer is, "No, and that was never the goal."  As several people have
> said several times, the goal is not to be able to write literature in
> the DNS.  The goal is just to internationalize the DNS, subject to the
> limitations of the existing DNS.
>
> One of those limitations turns out to be the (in my opinion
> unfortunate) DNS property that it is case-preserving but
> case-insensitive.  As a historical fact, ExAmPlE.org, example.org,
> EXAMPLE.org, EXAMPLE.ORG, and example.ORG are all "equivalent" for the
> matching rules.  On my interpretation, the DNS server ought to return
> an answer to any of those queries with the name as it appears in the
> zone file, but some do other things (such as return a pointer to the
> question section, which means you get back the form as you asked it).
>
> What you are asking is, I'm sure, a completely natural extension of
> that principle in your view: you want école.fra to match ECOLE.FRA.
> The problem is that this doesn't work the same way, because ecole.fra
> and ECOLE.FRA also match each other, so now we have an ambiguous
> combination.  And that's only in the case where you actually know the
> label is "in French" -- already an extremely complicated problem,
> since we don't have a universally agreed-upon authority as to what
> language any given word is in.  (You can't learn it from the DNS
> without either an additional query or special processing on the server
> side, both of which rules are, as far as I understand, antirequisites
> for the current work.)
>
> Note that, in some contexts in English, it would be very surprising
> that case didn't matter.  If case were not important in English, then
> we would have lost them some time ago (also, a signficant body of
> poetic work would be affected).  This is not a battle between people
> who speak English and whose every natural impulse is accommodated
> vs. everyone else.  It's just a matter of finding the set of
> compromises that will fit within the compromises that were already set
> when the DNS became successful.
>
> All of the above said, as far as I know the mapping document is still
> open for comment.  If you know some way by which these mappings are
> achievable, I'm sure everyone would love to hear them.
>
> Best regards,
>
> A
>
> --
> Andrew Sullivan
> ajs at shinkuro.com
> Shinkuro, Inc.
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090630/4bcb5671/attachment.htm 


More information about the Idna-update mailing list