Implementation questions

Erik van der Poel erikv at google.com
Mon Dec 22 18:41:45 CET 2008


On Mon, Dec 22, 2008 at 3:15 AM, Harald Alvestrand <harald at alvestrand.no> wrote:
> When a search engine does indexing, it maps together a lot more URLs
> than the ones that appear superficially syntactically equivalent.
>
> My personal answer is that the URI (with a punycoded domain name) should
> be provided, because that's what the search engine has actually observed
> to cause the page to be fetched), and reconstructing an IRI from an URI
> is an error prone process (you basically have to guess).
>
> Given that some people have decided that they want to provide IRIs to
> the user, some A-labels have to be converted back to U-labels. In both
> the cases we have let ourselves be twisted around by (final sigma and
> ß), the punycode form maps back to exactly one U-label.

ZWJ and ZWNJ are two more cases somewhat like that.

> This U-label will cause a different A-label in an IDNA2003 browser than
> in an IDNA2008 browser. But there's absolutely no ambiguity on which
> U-label to return; there is no possible U-label that one can return to
> an IDNA2003 browser that can cause that browser to go to the IDNA2008 site.
>
> So the answer is simple: Return the A-label form. Preferably by
> abandoning the idea of returning IRIs.

Absolutely. However, the A-labels corresponding to U-labels that
contain eszett, final sigma, ZWJ and ZWNJ do not currently work in
IE7. We just have to pray that Microsoft will fix that, and that
users' copies of IE7 will be updated. If so, we don't need another
prefix.

Erik


More information about the Idna-update mailing list