Touchstones for "Mapping"

Harald Alvestrand harald at alvestrand.no
Thu Apr 2 16:37:21 CEST 2009


Mark Davis wrote:
> I think the main storage benefits are human readability. It is much
> easier to read:
>
> href="http://εύβοια.el"
> rather than
> href="http://xn--mxabir3a6f.el"
>
> or in some XML:
>
> <url>http://εύβοια.el</url>
> rather than
> <url>http://xn--mxabir3a6f.el</url>
>
> But there are other issues: URL's are stored all over the place. If I
> have one in an SQL database, I want to be able to do a SELECT Data
> WHERE Url LIKE 'http://εύβοια%' and not 'http://xn--mxabir3a6f%'.
>
> And there are formal problems, because substrings in Unicode space
> don't match substrings in PunyCode space. that if my URL were
> "http://εύβοια-ξενοδοχείο.el" (a made up example), then its A-Label
> form is "http://xn----vlbedmcdb5a7bjigbc9jyd.el". The SELECT of
> 'http://xn--mxabir3a6f%' would fail. Moreover, Url LIKE
> 'xn--mxabir3a6f%' can even return false results, strings whose U-Label
> doesn't start with 'http://εύβοια%'
>   
the interesting part here is of course when you search for εύβοια% and 
forget the tonos on the second character. It WILL match on a decent SQL 
database that supports UTF-8, but (having forgotten the correct 
treatment of tonos in IDNA) I don't know if the domain name will match.

              Harald




More information about the Idna-update mailing list