Mapping and Variants

Mon Mar 9 14:25:21 CET 2009

Erik, et al,

this is plainly a "side of the bus" problem. Each argument that opens  
up another portion of the Unicode glyph space to use with IDNs  
increases the combinatoric implications for bundling or for abusive  
registrations.

Martin,

rather than focusing solely on the example that John used, I think it  
is probably more useful to think about the evident side-effects of  
incorporating IPA characters as PVALID under IDNA rules. I am not  
arguing here that they should be excluded but only that if they are  
included, we must think how best to deal with the kinds of confusion  
that Erik and others have described.

I think we all understand that we cannot avoid all forms of confusion  
by relying on protocol-level constraints alone. We already know about  
the zero/one "oh"/"ell" confusion even with the LDH constrained set  
for example and with the inclusion of the new Unicode characters, the  
opportunities for confusing registrations is vastly larger.

If you buy the argument that we can't solve this problem entirely with  
protocol rules, then we have to rely on educating registries/ 
registrars/registrants using all levels of the hierarchical DNS that  
these problems exist. Of course, there will be those who will exploit  
any opportunity to use PVALID characters to create misleading domain  
names.

However, it does seem useful to make sure that inclusion of a  
potentially confusing block of Unicode characters is explicitly  
considered.

In the case of IPA, despite the ample and clear potential for  
confusion, it is my understanding that Mark Davis has pointed out that  
some (many?) of these characters in the International Phonetic  
Alphabet are used in written African (others?) languages. If it were  
the case that these glyphs were used ONLY for phonetic  
representations, I would argue against their inclusion in the PVALID  
set of IDNA characters. But if it is correct that they are or are  
expected to be used in written languages, one can understand an  
argument for their inclusion. What is painful, is the combinatoric  
effect these characters produce if one is to try to counter their  
abuse through treatment as variants (ie bundling, or other restrictive  
registration policies). Perhaps that is a price we have to pay for  
attempting to be open to including written languages not yet a part of  
the Unicode system?

vint

Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com

On Mar 9, 2009, at 8:57 AM, Erik van der Poel wrote:

> I'm not sure why John hasn't responded to this, but let me give my own
> reason for agreeing that this is an issue. Note that John said that
> Greek small alpha and Latin small a must be treated as variants (i.e.
> bundling), not mapping.
>
> John didn't mention keyboard input explicitly, but that is what I
> thought of when I agreed. I.e. a user might accidentally type a Greek
> A where a Latin A was "supposed" to be, and if the registrant wants
> all users to reach their site no matter what keyboard accidents they
> might make, then the registrant must perform a bundling operation to
> make that work.
>
> My keyboard example may be a little contrived, but not outrageous, in
> my opinion. John may have a different point of view or a different
> reason for suggesting the bundling.
>
> Erik
>
> On Mon, Mar 9, 2009 at 1:25 AM, Martin Duerst  
> <duerst at it.aoyama.ac.jp> wrote:
>> John said in an earlier mail
>> (http://www.alvestrand.no/pipermail/idna-update/2009-March/003751.html 
>> ,
>> second to last paragraph) that he thinks that if we do mapping,
>> we have to map all of upper and lower case Latin a and Greek alpha
>> to the same thing.
>>
>> The only thing I want is to very, very strongy question the above.
>>
>> Of course, somebody will registers AΑ, where the first is Latin
>> and the second is Greek, e.g. on a third or fourth level, just
>> because they can, but what I'm trying to say is that this is not
>> a typical use case, and not one that we have to design mapping for
>> (independent of whether mapping is part of the protocol
>> (most probably not) or otherwise).
>>
>> Regards,   Martin.
>>
>>
>> At 15:08 09/03/09, Patrik F舁tstr�����阡綺
>> 章��轣�屋姐��癆�鯵�岡�浴鶯蜴�汀纈齡 
>> ��阡綺
>> �松�吏�竟蜴����癆��蜩�蜩�轣蜴踟�� 
>> 繽鱚�竅讙�矼竅��迚�鈑
>> 松�黹鱸頸�蜴�艱鈬鱇讙�瘤��鴒�逡竏�蜴 
>> ��蜩�竅黼��蜩��矚�蜆縺�
>> 松�鈿鋏蜚蓴�鈔蜴�纔瘢韭纉��纈�蜚�痺�瘡 
>> 踟�迚艾�轣諷�齒辣�黼銖絳
>> 松�齦竏�癈�μvolt. Can you give an example that makes a  
>> bit more
>>>> sense than just "AA"?
>>>
>>> Martin, people will most certainly register this, "just because they
>>> can". The example because of this I think is valid.
>>>
>>> You also have to remember that people do have interest in mixing
>>> scripts, for example various scripts and latin.
>>>
>>> To limit the problems we do have in IDNA2008 two things that protect
>>> against problems:
>>>
>>> - We have defined what is a U-label and A-label, and because of  
>>> this,
>>> it is a very very clear signal what codepoints should be used. If we
>>> also have mappings, fine, but it is clear that those characters  
>>> are in
>>> the gray area whether they should be used for example in  
>>> publications.
>>>
>>> - We have for the most problematic situations regular expressions  
>>> that
>>> limit the use of some codepoints that create real problems if they  
>>> are
>>> used in a non-intended-context.
>>>
>>> What do you want more? You want more regular expressions? You want  
>>> to
>>> reopen the discussion on mixing scripts again?
>>>
>>>    Patrik
>>>
>>>
>>>
>>> content-type: application/pgp-signature; x-mac- 
>>> type=70674453;name=PGP.sig
>>> content-description: This is a digitally signed message part
>>> content-disposition: inline; filename=PGP.sig
>>> content-transfer-encoding: 7bit
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.8 (Darwin)
>>>
>>> iD8DBQFJtLJErMabGguI180RAiejAJwPnN20mypjEy4cMccW8luTM8/c5wCfXxmG
>>> S117mtZOxEs1rQNlATKwI7o=
>>> =QXj6
>>> -----END PGP SIGNATURE-----
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update