Mapping and Variants
Erik van der Poel
erikv at google.com
Mon Mar 9 13:57:47 CET 2009
I'm not sure why John hasn't responded to this, but let me give my own
reason for agreeing that this is an issue. Note that John said that
Greek small alpha and Latin small a must be treated as variants (i.e.
bundling), not mapping.
John didn't mention keyboard input explicitly, but that is what I
thought of when I agreed. I.e. a user might accidentally type a Greek
A where a Latin A was "supposed" to be, and if the registrant wants
all users to reach their site no matter what keyboard accidents they
might make, then the registrant must perform a bundling operation to
make that work.
My keyboard example may be a little contrived, but not outrageous, in
my opinion. John may have a different point of view or a different
reason for suggesting the bundling.
On Mon, Mar 9, 2009 at 1:25 AM, Martin Duerst <duerst at it.aoyama.ac.jp> wrote:
> John said in an earlier mail
> second to last paragraph) that he thinks that if we do mapping,
> we have to map all of upper and lower case Latin a and Greek alpha
> to the same thing.
> The only thing I want is to very, very strongy question the above.
> Of course, somebody will registers AΑ, where the first is Latin
> and the second is Greek, e.g. on a third or fourth level, just
> because they can, but what I'm trying to say is that this is not
> a typical use case, and not one that we have to design mapping for
> (independent of whether mapping is part of the protocol
> (most probably not) or otherwise).
> Regards, Martin.
> At 15:08 09/03/09, Patrik F舁tstr�����阡綺
> 松�齦竏�癈�μvolt. Can you give an example that makes a bit more
>>> sense than just "AA"?
>>Martin, people will most certainly register this, "just because they
>>can". The example because of this I think is valid.
>>You also have to remember that people do have interest in mixing
>>scripts, for example various scripts and latin.
>>To limit the problems we do have in IDNA2008 two things that protect
>>- We have defined what is a U-label and A-label, and because of this,
>>it is a very very clear signal what codepoints should be used. If we
>>also have mappings, fine, but it is clear that those characters are in
>>the gray area whether they should be used for example in publications.
>>- We have for the most problematic situations regular expressions that
>>limit the use of some codepoints that create real problems if they are
>>used in a non-intended-context.
>>What do you want more? You want more regular expressions? You want to
>>reopen the discussion on mixing scripts again?
>>content-type: application/pgp-signature; x-mac-type=70674453;name=PGP.sig
>>content-description: This is a digitally signed message part
>>content-disposition: inline; filename=PGP.sig
>>-----BEGIN PGP SIGNATURE-----
>>Version: GnuPG v1.4.8 (Darwin)
>>-----END PGP SIGNATURE-----
>>Idna-update mailing list
>>Idna-update at alvestrand.no
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update