Mapping and Variants
vint at google.com
Mon Mar 9 14:25:21 CET 2009
Erik, et al,
this is plainly a "side of the bus" problem. Each argument that opens
up another portion of the Unicode glyph space to use with IDNs
increases the combinatoric implications for bundling or for abusive
rather than focusing solely on the example that John used, I think it
is probably more useful to think about the evident side-effects of
incorporating IPA characters as PVALID under IDNA rules. I am not
arguing here that they should be excluded but only that if they are
included, we must think how best to deal with the kinds of confusion
that Erik and others have described.
I think we all understand that we cannot avoid all forms of confusion
by relying on protocol-level constraints alone. We already know about
the zero/one "oh"/"ell" confusion even with the LDH constrained set
for example and with the inclusion of the new Unicode characters, the
opportunities for confusing registrations is vastly larger.
If you buy the argument that we can't solve this problem entirely with
protocol rules, then we have to rely on educating registries/
registrars/registrants using all levels of the hierarchical DNS that
these problems exist. Of course, there will be those who will exploit
any opportunity to use PVALID characters to create misleading domain
However, it does seem useful to make sure that inclusion of a
potentially confusing block of Unicode characters is explicitly
In the case of IPA, despite the ample and clear potential for
confusion, it is my understanding that Mark Davis has pointed out that
some (many?) of these characters in the International Phonetic
Alphabet are used in written African (others?) languages. If it were
the case that these glyphs were used ONLY for phonetic
representations, I would argue against their inclusion in the PVALID
set of IDNA characters. But if it is correct that they are or are
expected to be used in written languages, one can understand an
argument for their inclusion. What is painful, is the combinatoric
effect these characters produce if one is to try to counter their
abuse through treatment as variants (ie bundling, or other restrictive
registration policies). Perhaps that is a price we have to pay for
attempting to be open to including written languages not yet a part of
the Unicode system?
1818 Library Street, Suite 400
Reston, VA 20190
vint at google.com
On Mar 9, 2009, at 8:57 AM, Erik van der Poel wrote:
> I'm not sure why John hasn't responded to this, but let me give my own
> reason for agreeing that this is an issue. Note that John said that
> Greek small alpha and Latin small a must be treated as variants (i.e.
> bundling), not mapping.
> John didn't mention keyboard input explicitly, but that is what I
> thought of when I agreed. I.e. a user might accidentally type a Greek
> A where a Latin A was "supposed" to be, and if the registrant wants
> all users to reach their site no matter what keyboard accidents they
> might make, then the registrant must perform a bundling operation to
> make that work.
> My keyboard example may be a little contrived, but not outrageous, in
> my opinion. John may have a different point of view or a different
> reason for suggesting the bundling.
> On Mon, Mar 9, 2009 at 1:25 AM, Martin Duerst
> <duerst at it.aoyama.ac.jp> wrote:
>> John said in an earlier mail
>> second to last paragraph) that he thinks that if we do mapping,
>> we have to map all of upper and lower case Latin a and Greek alpha
>> to the same thing.
>> The only thing I want is to very, very strongy question the above.
>> Of course, somebody will registers AΑ, where the first is Latin
>> and the second is Greek, e.g. on a third or fourth level, just
>> because they can, but what I'm trying to say is that this is not
>> a typical use case, and not one that we have to design mapping for
>> (independent of whether mapping is part of the protocol
>> (most probably not) or otherwise).
>> Regards, Martin.
>> At 15:08 09/03/09, Patrik F舁tstr�����阡綺
>> 松�齦竏�癈�μvolt. Can you give an example that makes a
>> bit more
>>>> sense than just "AA"?
>>> Martin, people will most certainly register this, "just because they
>>> can". The example because of this I think is valid.
>>> You also have to remember that people do have interest in mixing
>>> scripts, for example various scripts and latin.
>>> To limit the problems we do have in IDNA2008 two things that protect
>>> against problems:
>>> - We have defined what is a U-label and A-label, and because of
>>> it is a very very clear signal what codepoints should be used. If we
>>> also have mappings, fine, but it is clear that those characters
>>> are in
>>> the gray area whether they should be used for example in
>>> - We have for the most problematic situations regular expressions
>>> limit the use of some codepoints that create real problems if they
>>> used in a non-intended-context.
>>> What do you want more? You want more regular expressions? You want
>>> reopen the discussion on mixing scripts again?
>>> content-type: application/pgp-signature; x-mac-
>>> content-description: This is a digitally signed message part
>>> content-disposition: inline; filename=PGP.sig
>>> content-transfer-encoding: 7bit
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.8 (Darwin)
>>> -----END PGP SIGNATURE-----
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>> #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
>> Idna-update mailing list
>> Idna-update at alvestrand.no
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update