Data on confusables

Mark Davis ⌛ mark at macchiato.com
Thu Jul 30 01:13:23 CEST 2009


comments below.

Mark


On Tue, Jul 28, 2009 at 03:06, Gervase Markham <gerv at mozilla.org> wrote:

> On 27/07/09 21:26, Mark Davis ⌛ wrote:
>
>> I don't have a count of domain names. The figures I gave do part of what
>> you are asking for:
>>
>> A. characters allowed by IDNA2008 that are confusable with /at least
>> one/ other character allowed by IDNA2008
>>
>
                                      Raw%    WeightedWeb% WeightedIdn%
pValidHasPValidConfusable:            4.17%     94.53%      99.9974%
cValidHasCValidConfusable:           +0.08%     +0.05%     +1.57E-05%

A is the sum of the above, for each column


>>
>> B.  characters allowed by IDNA200*3* that are confusable with /at least/
>> one other character allowed by IDNA200*3* (/and/ not in A)
>
>
                                      Raw%    WeightedWeb% WeightedIdn%
pValid2003HasPValid2003Confusable:   +1.36%     +0.20%     +1.08E-05%

B is the sum of all three lines, for each column.


>>
> Forgive me, but I'm having trouble relating these two definitions to the
> numbers in your original post. Could you tell me the values of A and B?


Sorry; it was a bit complicated because I separated out the PVALID from
PVALID+CONTEXT, and did them as differences to because the numbers in the
final column are easier to understand that way. Maybe a graph will make that
clearer; I attached one. I hope it helps.

I think the main conclusion is that the narrowing of PVALID characters from
IDNA2003 to IDNA2008, and the checks in CONTEXTO make no material difference
in the opportunities for spoofing.


>
>  I'm showing no additional characters in that group; that is, any
>> PVALID2008 character with a confusable in PVALID2003 also has a
>> confusable in PVALID2008. (The number of other characters that each
>> could be confused with does grow, but that doesn't change whether or not
>> they can be spoofed.)
>>
>
> OK, that's interesting. It does reinforce the point that registry policy
> still has a large part to play; what we've done is made it easier for
> registries to formulate that policy because they have to consider fewer
> characters.
>

I don't think that IDNA2008 will change much regarding spoofing. Some
registries may be bound by the terms of IDNA2008, but most will not be. They
could chose to abide by it strictly, or they could allow characters like
HEART if they are in demand, or for compatibility with IDNA2003. Excluding
symbols and punctuation or checking CONTEXTO characters will do almost
nothing in terms disabling spoof IDNs.

Conversely, the client side can't depend on the registries' all doing "the
right thing", and will need to supply their own tests for spoofing; and for
them as well, excluding symbols or checking for CONTEXTO accomplishes almost
nothing as far as detecting spoofs.

Note that the the Unicode security guidelines have for some time also
recommend flagging in clients the use of symbols and punctuation in domain
names, and cautioning their use by registries. So in that sense those
guidelines are quite similar to what is now in IDNA2008. The above
information on the magnitude of the issue was a surprise to me, so we'll
need to caution people also in those guidelines about how much good one can
expect such restrictions do. (Those guidelines also, however, go much
further in terms of dealing with confusables -- however, not in a way that
you would want to (or be able to) bake into a protocol.)

On the other hand, there are two processes in IDNA2008 that are are quite
valuable for reducing spoofing. The first is the BIDI rules, and the second
are the CONTEXTJ rules. Unfortunately, both of those are not required in the
client, although we can hope that most clients will do them anyway.



>
> Gerv
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090729/b30bf27c/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image-1.png
Type: image/png
Size: 22042 bytes
Desc: not available
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090729/b30bf27c/attachment-0001.png 


More information about the Idna-update mailing list