Browser IDN display policy: opinions sought
Paul Hoffman
phoffman at imc.org
Sat Dec 10 18:26:36 CET 2011
First, Mark's correction (which needs to be checked) is an important one:
On Dec 9, 2011, at 3:12 AM, Gervase Markham wrote:
> The policies fall into 3 approximate buckets:
>
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.
>
> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).
>
> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.
Later, Mark Davis said:
On Dec 9, 2011, at 10:10 AM, Mark Davis ☕ wrote:
> I'm not familiar with the code, but I think that (A) may actually be:
>
> A (IE, Chrome): Unicode if the (single) 'script' of the string matches one of the scripts of the user's language(s) in the options,
> Punycode otherwise.
>
> It is pretty easy and reliable to detect the script of the string, whereas language detection would be unreliable.
What a few people might be asking for is:
D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.
Restated less tersely:
D: If every character in the label comes from a single script as defined in the Unicode Standard, and every character is displayable by the browser without resorting to "unknown" or "fallback" glyphs, display the label; otherwise show Punycode.
This would lead to zone owners having more assurance of their zones being displayed properly as long as every label is single-script. It requires no options-setting on the part of the user, which is a big win over (A) for users who are multi-lingual, and completely avoids the "TLDs we like" problem of B.
Thoughts?
--Paul Hoffman
More information about the Idna-update
mailing list