Mapping and Variants
vint at google.com
Sat Mar 7 12:11:43 CET 2009
unless prohibited either at registration time or by protocol,
it is likely that any bad cases will be exercised by people
looking to fool others into doing the wrong thing with domain
names. So I guess I would lean towards finding ways to
confine permitted behaviors to those less likely to be
troublesome. I would include in "bad cases" script
mixing, even though it might have some exotic appeal
for some cases that aren't intentionally "bad".
I hope that makes sense.
1818 Library Street, Suite 400
Reston, VA 20190
vint at google.com
On Mar 7, 2009, at 3:31 AM, Martin Duerst wrote:
> At 06:06 09/03/06, John C Klensin wrote:
>> When IDNA2003 was written, no one (as far as I know) anticipated
>> the need to create elaborate variant (bundling) systems to
>> associate potentially-confusing labels within a zone so that
>> they could be given special treatment.
> Maybe the exact details weren't anticipated, but lots of
> discussion surrounding the issues definitely went on way
> before IDNA2003 was final. Whether we called it 'bundling'
> or whatever else, I'm pretty sure people such as Ken and
> me who were sceptical (and, as it turned out, right) on a
> central, uniform solution for CJK simplified/traditional
> mappings were mentioning solutions in this direction.
>> For scripts with case differences, IDNA2003 also chose to
>> concentrate on lower case, partially because there was better
>> differentiation of those characters. It has often been
>> observed, for example, that Greek lower case ("SMALL LETTER")
>> alpha and beta don't look nearly enough like their Latin
>> counterparts ("a" and "b") to be confusing to anyone, but that
>> the capital character pairs are identical.
>> Unfortunately, if one has a situation in which Greek and Latin
>> scripts are considered today and chooses to use variants _and_
>> has the expectation of case-mapping, GREEK SMALL LETTER ALPHA
>> (U+03B1) must be treated as a variant of LATIN SMALL LETTER A
>> (U+0061) because a user might be looking at the combination of
>> GREEK CAPITAL LETTER ALPHA (U+0391) and LATIN CAPITAL LETTER A
>> (U+0041) which map (CaseFold) into the lower case pair. That
>> sort of relationship exists for a significant number of
>> Latin-Greek pairs and for a much larger number of Cyrillic-Greek
>> pairs. For Cyrillic, it just about doubles the number of
>> variants in the table.
> Is this some highly theoretical discussion, or do you actually
> expect that this would be needed in practice? In my view, it
> should clearly be treated as the former, but I would have
> expected you to say so if you thought so.
> Why do I think so? It is well accepted now that script mixing
> is a bad idea, exactly because of cases such as the above.
> So a label consisting of a Latin and a Greek small letter
> a/alpha just doesn't make much sense to start with.
> It is also well-known that some carefully choosen letter
> combinations in one script, in particular in upper case,
> are difficult or impossible to visually distinguish from
> potentially completely different letter combinations in
> other scripts. But these are few and far between, in particular
> if they are of a certain length and contain some bits of
> I would also like to point out that with your approach
> above, you may not be able to stop at letter pairs. As
> an example, in script fonts and handwriting, Cyrillic
> Ts (both upper and lower case) may look similar to Latin
> Ms, but in print fonts, Cyrillic and Latin Ms look alike.
> So suddenly, you have to group Cyrillic Ts and Ms with
> Latin Ms. Not sure anybody will use such a system, at
> least not for Cyrillic :-(.
> Regards, Martin.
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update