Thanks for the thoughtful explanation Andrew, I agree with every aspect of it.<div><br></div><div>=wil<br><br><div class="gmail_quote">On Tue, Jun 30, 2009 at 4:03 AM, Andrew Sullivan <span dir="ltr"><<a href="mailto:ajs@shinkuro.com">ajs@shinkuro.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">On Mon, Jun 29, 2009 at 07:21:22PM +0200, Marie-France Berny wrote:<br>
> 2009/6/29 Andrew Sullivan <<a href="mailto:ajs@shinkuro.com">ajs@shinkuro.com</a>><br>
> ><br>
> > Please don't hijack this thread.<br>
><br>
><br>
> ????<br>
<br>
I mean that the thread was talking about one thing, and you have<br>
introduced a different topic. It appears you're doing so unwittingly,<br>
but I want not to conflate these two topics.<br>
<br>
> The mapping of lower-case non-ASCII characters with respect to upper-case<br>
> > apparently-ASCII characters is _not_ the same question as the effects of<br>
> > lower- and upper-case ASCII across the U-label/A-label boundary.<br>
><br>
><br>
> I am sorry. I have not the slightest idea of what you are talking about. I<br>
> read an attempt to come to a quick conclusion regarding punycode and where<br>
> to carry mapping. Or am I wrong?<br>
<br>
Wrong, I'm afraid. The specific question was about ASCII characters<br>
that _remain ASCII_ when using Punycode to transform the label. So<br>
for instance, in<br>
<br>
abcdé<br>
<br>
and<br>
<br>
ABCDé<br>
<br>
the 'abcd' and 'ABCD' parts are not, strictly speaking, touched by<br>
Punycode. Under IDNA2003 there's a simple answer for this, because of<br>
the way it works. Under IDNA2008, the earliest proposals did no<br>
mapping at all, and we haven't settled what mapping if any will<br>
happen. Therefore, there is a question about what to do with these<br>
particular cases.<br>
<br>
> As far as I understand, there is one clarification missing. It is what do<br>
> you define as "global" in here. Are French (and possibly Persian, and<br>
> probably many others...) included?<br>
<br>
Yes, in the sense that there is one giant domain name system under<br>
which everything has to fit, because the whole system is a tree<br>
structure with one root. (I'll leave aside for the moment the<br>
possibility of "alternate roots", since every actual example of that<br>
is in fact just a change of the servers holding the "unique root", and<br>
not a change to the principle that there is a spot where the namespace<br>
starts.)<br>
<br>
If you mean, "Will it support French, Persian, English, Chinese,<br>
Arabic, and any other language Unicode supports in ways that are<br>
completely natural to the readers and writers of those languages?" the<br>
answer is, "No, and that was never the goal." As several people have<br>
said several times, the goal is not to be able to write literature in<br>
the DNS. The goal is just to internationalize the DNS, subject to the<br>
limitations of the existing DNS.<br>
<br>
One of those limitations turns out to be the (in my opinion<br>
unfortunate) DNS property that it is case-preserving but<br>
case-insensitive. As a historical fact, ExAmPlE.org, <a href="http://example.org" target="_blank">example.org</a>,<br>
EXAMPLE.org, <a href="http://EXAMPLE.ORG" target="_blank">EXAMPLE.ORG</a>, and example.ORG are all "equivalent" for the<br>
matching rules. On my interpretation, the DNS server ought to return<br>
an answer to any of those queries with the name as it appears in the<br>
zone file, but some do other things (such as return a pointer to the<br>
question section, which means you get back the form as you asked it).<br>
<br>
What you are asking is, I'm sure, a completely natural extension of<br>
that principle in your view: you want école.fra to match ECOLE.FRA.<br>
The problem is that this doesn't work the same way, because ecole.fra<br>
and ECOLE.FRA also match each other, so now we have an ambiguous<br>
combination. And that's only in the case where you actually know the<br>
label is "in French" -- already an extremely complicated problem,<br>
since we don't have a universally agreed-upon authority as to what<br>
language any given word is in. (You can't learn it from the DNS<br>
without either an additional query or special processing on the server<br>
side, both of which rules are, as far as I understand, antirequisites<br>
for the current work.)<br>
<br>
Note that, in some contexts in English, it would be very surprising<br>
that case didn't matter. If case were not important in English, then<br>
we would have lost them some time ago (also, a signficant body of<br>
poetic work would be affected). This is not a battle between people<br>
who speak English and whose every natural impulse is accommodated<br>
vs. everyone else. It's just a matter of finding the set of<br>
compromises that will fit within the compromises that were already set<br>
when the DNS became successful.<br>
<br>
All of the above said, as far as I know the mapping document is still<br>
open for comment. If you know some way by which these mappings are<br>
achievable, I'm sure everyone would love to hear them.<br>
<br>
Best regards,<br>
<br>
A<br>
<font color="#888888"><br>
--<br>
Andrew Sullivan<br>
<a href="mailto:ajs@shinkuro.com">ajs@shinkuro.com</a><br>
Shinkuro, Inc.<br>
_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</font></blockquote></div><br></div>