Visually confusable characters

Sat Aug 9 04:20:49 CEST 2014

On 8/8/2014 11:32 AM, Jefsey wrote:
>
>> *At 02:27 08/08/2014, Andrew Sullivan wrote:*
>> I don't think that's a fair characterization.  Nobody is 
>> "second-guessing" anything.  It's rather that we -- John, actually -- 
>> discovered that there's a consequence of this case that we did not 
>> previously understand, and it has uncomfortable consequences for the 
>> way we had previously relied on Unicode, because it didn't work the 
>> way we thought. 
>
> Dear Andrew,
> May be time to reconsider the idea of an IETF Unicode including our 
> exception management through an additional protocol rather than only 
> by Patrik's tables? 

"Additional protocol" sounds like it's headed in the right direction.

There are already several levels to this

  * Unicode (repertoire and basic normalization)
  * IDNA (including repertoire and context rules)
  * Label Generation Rulesets (including repertoire, context rules and
    blocked variants)
  * String Review (case by case)

Of these, the formulation of Label Generation Rulesets allow a solution 
to issues like these that can be used to address issues like the current 
one without the need to pick an arbitrary preferred encoding. They 
provide ways to specify a first-come, first-serve, but mutually 
exclusive selection among alternatives, which is much less 
"linguistically damaging" than blunt restrictions repertoire alone.

What is missing, but what keeps surfacing in the discussions around 
creating the LGR for the Root Zone is the need for enforceable "best 
practices" on LGRs.

If there was an "additional protocol" where problematic cases could be 
identified and translated into a binding requirement on LGRs (and 
therefore registration policies) to either disallow all but one of the 
alternatives, or to have a robust way of mutually excluding labels from 
registration (using the blocked variant mechanism) then it would seem 
that you get the effect of robust lookup, without having to arbitrarily 
play linguistic favorites.

The same protocol could be applied to handle any new registrations for 
the many similar cases of homoglyphs and homographs, whether across 
scripts or within scripts.

Being less "linguistically damaging" it is amenable to be employed in a 
wider selection of cases as well.

A./

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140808/d90a05cf/attachment.html>