UTF-8

John C Klensin klensin at jck.com
Sat Jun 26 17:32:43 CEST 2010



--On Saturday, June 26, 2010 11:54 +0200 Nick Teint
<nick.teint at googlemail.com> wrote:

> 2010/6/17 Nicolas Williams <Nicolas.Williams at oracle.com>:
>> On the one hand, I agree: ACE leakage into UIs is bad,
>> therefore ACE avoidance is good.
> 
> Sometimes, you _do_ want ACEs to leak into the UI:
> 
> 1. Your user does not know the script. Displaying an ugly ACE
> string is better than displaying some
> known-to-be-unrecognisable characters.*
> 
> 2. You don't have the fonts. Displaying an ugly ACE string is
> better than displaying "???????".

Both of these points have been made many times before.  Note
that  "???????" and its equivalents (e.g., row of little boxes)
are universal confusables -- they can be confused with, and
match in user perception, _any_ string for which the user does
not have fonts.  For the cautious user, that should be a strong
warning.  But, for many users, if those strings are seen enough
and sometimes contain legitimate domain names, the effect will
likely be the say as a pop-up box that says something
incomprehensible followed by an "ok" button (or even "continue"
or "cancel" ones).

> 3. The string contains conspicuous confusables. Displaying an
> ugly ACE string is better than displaying a
> maliciously-crafted string.

As long as one can know that the string is maliciously-crafted,
sure.  A big warning that says "this is malicious, you aren't
looking at what you think you are, and you are likely to damage
your machine, your identity, your financed, or your soul if you
continue" would be even better and does not require displaying
an ACE.   For less clear cases, the answers are less clear.  And
this is a world filled by edge cases, especially when most
confusables are aspects of the perception of the user,
conspicuous or not.

> PS: * For this purpose, it might even make sense to define
> Script-Compatible Encodings (SCEs) for scripts other than
> Latin/ASCII.

I'd be interested in understanding what you have in mind.
However, the conventional/historical way to do this is by
transliteration into Latin characters.  If we thought that was
satisfactory, we wouldn't need (or even want) IDNs.  And, of
course, transliterations would introduce a new family of
opportunities for confusion.

   john







More information about the Idna-update mailing list