Visually confusable characters (3)
Asmus Freytag
asmusf at ix.netcom.com
Mon Aug 11 10:16:07 CEST 2014
On 8/10/2014 9:44 PM, Patrik Fältström wrote:
> On 10 aug 2014, at 21:15, Asmus Freytag <asmusf at ix.netcom.com> wrote:
>
>> The most important is the ability to create equivalence classes among
>> code point (and sequences), known as variant sets.
> Variants have nothing to do with the equivalence that normalization does, and you can never ever replace lack of normalization with an equivalence set.
How so?
>
> As John has explained, the issue here is that we have two set of representations that might be treated the same, without any normalization that say they are equivalent.
>
> IETF has decided that IETF is to follow the rules that Unicode Consortium has created.
>
> This basic rule lead to the change in IDNA2008 that ß is not to be treated the same as 'ss', as they where equivalent in IDNA2003 due to case folding rules (one of the things removed to IDNA2008 so that A-label and U-label are 1:1 mappings and translation between the two is reversible).
>
> IDNA do have a mechanism for exceptions, and the whole idea for that is that we should be able to have these discussions.
IDNA's exception mechanism cannot actually amend normalization and force
something like ß to be treated the same as 'ss'. However, it could (in
principle) disallow either of them. Because it is not fundamentally an
exception on or extension of normalization, but an exception on repertoire.
>
> So can we please stay with this discussion on what is to be used in DNS?
>
> Variants have nothing to do with that. Variants have to do with *registration*policy*for the root zone, and then maybe a few TLDs.
>
> Nothing else.
I think it is useful to distinguish the technology from the
implementation. And it's useful and illuminating to consider what that
technology can deliver that is different from the exception mechanism in
IDNA. And, of course, also whether it has technical drawbacks.
From a technical point (blocked) variants are relatively similar to
normalization. Both define equivalence sets, but one leaves the choice
of "preferred" element open, while the other doesn't.
I'm tacitly assuming that we are considering only variants of the
homoglyph/homograph nature, because the other kind(s) are a different
kettle of fish altogether.
On top of any technical differences comes the differences regarding who
implements variants vs. exceptions.
Given those differences, the following quote from a parallel thread is
illuminating:
> For example, on Mon, Dec 22, 2008 at 8:03 AM, John C Klensin
> <klensin at jck.com <mailto:klensin at jck.com>> wrote:
> ...
>
> (i) What is, and is not, look-alike, is a very subjective
> business.
>
> ...
>
> The bottom line is that we've concluded that character
> combinations that are specifically phishing issues should be
> dealt with by registries, who presumably know what they are
> doing with scripts they choose to support, and by application
> implementers who can warn people against hazardous combinations
> (and potentially against registries who persistently permit
> registration of strings that have no real value other than to
> create phishing opportunities.
>
> ...
>
> These decisions were the result of explicit (and quite lengthy)
> discussion, not an "oversight".
>
Reading that, I would expect the IDNA protocol's exception mechanism to
be used in places where the issues are either so universal or so grave
as to warrant baking the solution in at the front end. And to defer to
other mechanisms available to registries (such as variants) to handle
less clear cut cases that are not of universal concern (but still concerns).
I believe that is a very proper discussion to have.
Given the facts of the case, that the sequences and singleton in
question are relatively obscure, the singleton being encoded later, but
potentially the more practically useful one, and the existence of many
parallel cases that were not addressed via exception mechanism, given
all these facts, I am somewhat doubtful whether the current case meets
the criteria of importance and consistency that would require it being
addressed in IDNA.
The more I learn about the particulars of the case, the more I keep
thinking that (despite looking like a normalization problem) it really
isn't and is more like the class of problem addressed in John's quote
from 2008.
A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140811/5eb04ff6/attachment.html>
More information about the Idna-update
mailing list