Touchstones for "Mapping"

Mark Davis mark at macchiato.com
Thu Apr 2 16:33:14 CEST 2009


I think the main storage benefits are human readability. It is much
easier to read:

href="http://εύβοια.el"
rather than
href="http://xn--mxabir3a6f.el"

or in some XML:

<url>http://εύβοια.el</url>
rather than
<url>http://xn--mxabir3a6f.el</url>

But there are other issues: URL's are stored all over the place. If I
have one in an SQL database, I want to be able to do a SELECT Data
WHERE Url LIKE 'http://εύβοια%' and not 'http://xn--mxabir3a6f%'.

And there are formal problems, because substrings in Unicode space
don't match substrings in PunyCode space. that if my URL were
"http://εύβοια-ξενοδοχείο.el" (a made up example), then its A-Label
form is "http://xn----vlbedmcdb5a7bjigbc9jyd.el". The SELECT of
'http://xn--mxabir3a6f%' would fail. Moreover, Url LIKE
'xn--mxabir3a6f%' can even return false results, strings whose U-Label
doesn't start with 'http://εύβοια%'

Mark

On Thu, Apr 2, 2009 at 05:50, Vint Cerf <vint at google.com> wrote:
> Martin,
>
> I continue to be somewhat confused by logic that suggests that storage benefits from being in the U-label form.   A-labels are almost de facto normative since they work withIDN-aware and IDN-unaware appllications. IDN-aware applications should be able to generate the corresponding U-label for presentation. IdN-unaware applications. Won't even recognize a U-label domain name as valid IWoild think. Consequently, storage in A-label form seems the rational choice. If you disagree, it must be because you see a flaw in the reasoning above. Can you clarify? V
>
> ----- Original Message -----
> From: idna-update-bounces at alvestrand.no <idna-update-bounces at alvestrand.no>
> To: Harald Alvestrand <harald at alvestrand.no>
> Cc: Andrew Sullivan <ajs at shinkuro.com>; idna-update at alvestrand.no <idna-update at alvestrand.no>
> Sent: Thu Apr 02 03:37:30 2009
> Subject: Re: Touchstones for "Mapping"
>
> There are two sides here, the protocol correctness and
> the content correctness. By content correctness, I mean
> whether the link e.g. goes to the intended page.
> Completely impossible to check with punycode, of course.
>
> Regards,   Martin.
>
> On 2009/04/02 16:56, Harald Alvestrand wrote:
>> Martin J. Dürst wrote:
>>> I very much agree with Harald. We are working on IDNs because we want
>>> humans to be able to easily read domain names in their script. Storing
>>> them as A-Labels when there is a reasonable chance that humans will
>>> have a look at them (e.g. in HTML or XML source, email source,...)
>>> is against the very intent of IDNs. Authors are humans, too, even
>>> if they work on plain text :-!
>> I can argue the other side of the argument for HTML and XML, though.....
>> the main thing being that humans who *enter* IDNs in Unicode form
>> without the benefit of conformance-enforcing software interfaces will
>> just about always get them wrong (due to bizarrities of case,
>> compatibility characters and other weirdnesses).
>>
>> If they enter A-labels by hand, it's pretty certain they've
>> cut-and-pasted them.
>>
>>               Harald
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
> --
> #-# Martin J.Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>


More information about the Idna-update mailing list