Visually confusable characters (was: Re: Unicode 7.0.0, (combining) Hamza Above, and normalization for comparison)
John C Klensin
klensin at jck.com
Thu Aug 7 19:37:40 CEST 2014
Since my earlier note either crossed this in the virtual mail or
didn't have any effect, let me try again with a changed subject
line and some bullet points that I hope are easy to follow.
(1) The question of visually confusable characters has almost
nothing to do with the issue of combining Hamza Above, U+08A1,
and/or other code points that appear to consist of a combination
of an Arabic base character and a Hamza above it. Confusing
the two just makes discussions more difficult if not impossible.
(2) The question of how one identifies "visually comfusable"
characters is a complex matter that can involve not only what
presentation form of those characters (e.g., type styles) that
are optimized for distinguishableness (e.g., I presume those
used in The Unicode Standard) look like, but some very
subjective issues about, e.g., what people expect to see.
Personally, I don't believe an objective standard and
categorization is possible unless one constrains the problem to
the point of making it uninteresting (e.g., by believing that
the world can, in practice, be forced into a single universal
type style of type family).
(3) The question of what one does once one identifies a pair (or
set) of characters as "visually confusable" is quite separate
from how those characters are identified (something that both
the JET effort and ICANN got right (by the time of the VIP
activity if not earlier). There are lots of choices including
blocking all of them (which Mark's note seems to suggest),
letting one of the group be registered and then blocking the
others, making sure that all of them are allocated to or
controlled by the same party, trying to link them together at
either the DNS or application layers, and other, often more
complex, strategies. I have trouble imagining any basis on
which the IETF or an IETF-derived WG list, would be the right
place for deciding on those strategies... even if we might help
identify some of the possibilities.
p.s. RFC 5895 was never intended to be standards track. Talking
about "denying" that status to it is misleading at best.
--On Thursday, August 07, 2014 18:18 +0200 JFC Morfin
<jefsey at jefsey.com> wrote:
> At 15:20 06/08/2014, Mark Davis â˜•ï¸� wrote:
>> P1: Any characters that are visually confusable with others
>> should be excluded from domain names.
> the internet must be neutral and transparent and the DNS is on
> a first come first serve basis managed by TLD Manager. The
> principle is: TLD Managers need a tool that permits them to
> avoid confusing registrations. This means that only the first
> confusable registration is to be accepted, not the others.
> This is not a filtering, it is an extension of the definition
> of what is already registered. Up to now, already registered
> means the exact string, it should mean the confusable string.
> This means that this belongs to the TLD Manager registrations
> terms, not to standardization.
> RFC 5895 states "As unusual as this may be for a document
> concerning Internet protocols, it is necessary to describe
> this operation for implementors ... which conflates 
> user-input operation into the protocol.". This is the case of
> the discussed option. IESG has denied RFC 5895 the standard
> track. This should be the same case here.
More information about the Idna-update