Visually confusable characters (3)

Mon Aug 11 10:16:07 CEST 2014

On 8/10/2014 9:44 PM, Patrik Fältström wrote:
> On 10 aug 2014, at 21:15, Asmus Freytag <asmusf at ix.netcom.com> wrote:
>
>> The most important is the ability to create equivalence classes among
>> code point (and sequences), known as variant sets.
> Variants have nothing to do with the equivalence that normalization does, and you can never ever replace lack of normalization with an equivalence set.

How so?

>
> As John has explained, the issue here is that we have two set of representations that might be treated the same, without any normalization that say they are equivalent.
>
> IETF has decided that IETF is to follow the rules that Unicode Consortium has created.
>
> This basic rule lead to the change in IDNA2008 that ß is not to be treated the same as 'ss', as they where equivalent in IDNA2003 due to case folding rules (one of the things removed to IDNA2008 so that A-label and U-label are 1:1 mappings and translation between the two is reversible).
>
> IDNA do have a mechanism for exceptions, and the whole idea for that is that we should be able to have these discussions.

IDNA's exception mechanism cannot actually amend normalization and force 
something like ß to be treated the same as 'ss'. However, it could (in 
principle) disallow either of them. Because it is not fundamentally an 
exception on or extension of normalization, but an exception on repertoire.
>
> So can we please stay with this discussion on what is to be used in DNS?
>
> Variants have nothing to do with that. Variants have to do with *registration*policy*for the root zone, and then maybe a few TLDs.
>
> Nothing else.

I think it is useful to distinguish the technology from the 
implementation. And it's useful and illuminating to consider what that 
technology can deliver that is different from the exception mechanism in 
IDNA. And, of course, also whether it has technical drawbacks.

 From a technical point (blocked) variants are relatively similar to 
normalization. Both define equivalence sets, but one leaves the choice 
of "preferred" element open, while the other doesn't.

I'm tacitly assuming that we are considering only variants of the 
homoglyph/homograph nature, because the other kind(s) are a different 
kettle of fish altogether.

On top of any technical differences comes the differences regarding who 
implements variants vs. exceptions.

Given those differences, the following quote from a parallel thread is 
illuminating:

> For example, on Mon, Dec 22, 2008 at 8:03 AM, John C Klensin 
> <klensin at jck.com <mailto:klensin at jck.com>> wrote:
> ...
>
>     (i) What is, and is not, look-alike, is a very subjective
>     business. 
>
> ...
>
>     The bottom line is that we've concluded that character
>     combinations that are specifically phishing issues should be
>     dealt with by registries, who presumably know what they are
>     doing with scripts they choose to support, and by application
>     implementers who can warn people against hazardous combinations
>     (and potentially against registries who persistently permit
>     registration of strings that have no real value other than to
>     create phishing opportunities.
>
> ...
>
>     These decisions were the result of explicit (and quite lengthy)
>     discussion, not an "oversight".
>

Reading that, I would expect the IDNA protocol's exception mechanism to 
be used in places where the issues are either so universal or so grave 
as to warrant baking the solution in at the front end. And to defer to 
other mechanisms available to registries (such as variants) to handle 
less clear cut cases that are not of universal concern (but still concerns).

I believe that is a very proper discussion to have.

Given the facts of the case, that the sequences and singleton in 
question are relatively obscure, the singleton being encoded later, but 
potentially the more practically useful one, and the existence of many 
parallel cases that were not addressed via exception mechanism, given 
all these facts, I am somewhat doubtful whether the current case meets 
the criteria of importance and consistency that would require it being 
addressed in IDNA.

The more I learn about the particulars of the case, the more I keep 
thinking that (despite looking like a normalization problem) it really 
isn't and is more like the class of problem addressed in John's quote 
from 2008.

A./

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140811/5eb04ff6/attachment.html>