Touchstones for "Mapping"

Andrew Sullivan ajs at shinkuro.com
Thu Apr 2 16:57:05 CEST 2009


On Thu, Apr 02, 2009 at 07:33:14AM -0700, Mark Davis wrote:
> I think the main storage benefits are human readability. It is much
> easier to read:
> 
> href="http://εύβοια.el"
> rather than
> href="http://xn--mxabir3a6f.el"
> 
> or in some XML:
> 
> <url>http://εύβοια.el</url>
> rather than
> <url>http://xn--mxabir3a6f.el</url>

Well, if you are doing this with vi, then you already have all sorts
of other problems, and this is one more piece of manual work you need
to do.  If you're doing it with some other editor, then you'd be
well-advised to use one that's IDNA-aware, and get the benefits that
way.  Then storage is always consistent with what could actually be
resolved.

> But there are other issues: URL's are stored all over the place. If I
> have one in an SQL database, I want to be able to do a SELECT Data
> WHERE Url LIKE 'http://εύβοια%' and not 'http://xn--mxabir3a6f%'.

  SELECT data WHERE ulabel(url) LIKE 'http://εύβοια%'

would also be a possibility.  But databases may be a special case,
because I can also imagine breaking up the URL into various bits
inside the database.  Indeed, I might want to store both the U-label
and the actually-used A-label, depending on the application.  I
concede that not all RDBMS support functional indexes on expressions
like idna(url), so I guess there'll be some cases where you
effectively have to store the U-label.

What I'd prefer to see is that storing in the A-label form is
RECOMMENDED, and that if the particular application demands storing
the UTF-8 instead, the U-label (i.e. canonicalized form) is
RECOMMENDED.
 
A
-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.


More information about the Idna-update mailing list