Browser IDN display policy: opinions sought

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Mon Dec 12 08:10:25 CET 2011


On 2011/12/11 2:26, Paul Hoffman wrote:
> First, Mark's correction (which needs to be checked) is an important one:

I very much think it is correct. From their very timid start with IDNs, 
ICANN has a strong and confusing tendency to cast issues in terms of 
"language" even if they are script issues. That has influenced the 
surroundings, too.

> On Dec 9, 2011, at 3:12 AM, Gervase Markham wrote:
>
>> The policies fall into 3 approximate buckets:
>>
>> A (IE, Chrome): Unicode if the (single) 'language' of the string is
>> configured in the options, Punycode otherwise.
>>
>> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
>> otherwise. Arbitrary script mixing permitted (registry policy used to
>> prevent abuse).
>>
>> C (Safari): Unicode if the script is in a whitelist (which by default
>> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
>> script mixing.
>
> Later, Mark Davis said:
>
> On Dec 9, 2011, at 10:10 AM, Mark Davis ☕ wrote:
>
>> I'm not familiar with the code, but I think that (A) may actually be:
>>
>> A (IE, Chrome): Unicode if the (single) 'script' of the string matches one of the scripts of the user's language(s) in the options,
>> Punycode otherwise.
>>
>> It is pretty easy and reliable to detect the script of the string, whereas language detection would be unreliable.
>
> What a few people might be asking for is:
>
> D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.
>
> Restated less tersely:
>
> D: If every character in the label comes from a single script as defined in the Unicode Standard, and every character is displayable by the browser without resorting to "unknown" or "fallback" glyphs, display the label; otherwise show Punycode.

Yes with the caveat that Patrick gave for punctuation and the additional 
caveat that whole-script confusables (confusable where e.g. one side is 
all-Latin and the other side is all-Cyrillic) should be checked for and 
addressed.

Regards,   Martin.


More information about the Idna-update mailing list