Mapping and Variants

Martin Duerst duerst at it.aoyama.ac.jp
Sat Mar 7 13:43:34 CET 2009


Hello Vint,

I agree that prohibiting script mixing should be the default
for any registry, apart from a few exceptions such as Japanese.
That's my main argument for why I think what John was talking
about in his mail is highly (if not completely) theoretical.

Regards,    Martin.

At 20:11 09/03/07, Vint Cerf wrote:
>Martin,
>
>unless prohibited either at registration time or by protocol,
>it is likely that any bad cases will be exercised by people
>looking to fool others into doing the wrong thing with domain
>names. So I guess I would lean towards finding ways to
>confine permitted behaviors to those less likely to be
>troublesome. I would include in "bad cases" script
>mixing, even though it might have some exotic appeal
>for some cases that aren't intentionally "bad".
>
>I hope that makes sense.
>
>v
>
>
>Vint Cerf
>Google
>1818 Library Street, Suite 400
>Reston, VA 20190
>202-370-5637
>vint at google.com
>
>
>
>
>On Mar 7, 2009, at 3:31 AM, Martin Duerst wrote:
>
>> At 06:06 09/03/06, John C Klensin wrote:
>>
>>> When IDNA2003 was written, no one (as far as I know) anticipated
>>> the need to create elaborate variant (bundling) systems to
>>> associate potentially-confusing labels within a zone so that
>>> they could be given special treatment.
>>
>> Maybe the exact details weren't anticipated, but lots of
>> discussion surrounding the issues definitely went on way
>> before IDNA2003 was final. Whether we called it 'bundling'
>> or whatever else, I'm pretty sure people such as Ken and
>> me who were sceptical (and, as it turned out, right) on a
>> central, uniform solution for CJK simplified/traditional
>> mappings were mentioning solutions in this direction.
>>
>>
>>> For scripts with case differences, IDNA2003 also chose to
>>> concentrate on lower case, partially because there was better
>>> differentiation of those characters.  It has often been
>>> observed, for example, that Greek lower case ("SMALL LETTER")
>>> alpha and beta don't look nearly enough like their Latin
>>> counterparts ("a" and "b") to be confusing to anyone, but that
>>> the capital character pairs are identical.
>>>
>>> Unfortunately, if one has a situation in which Greek and Latin
>>> scripts are considered today and chooses to use variants _and_
>>> has the expectation of case-mapping, GREEK SMALL LETTER ALPHA
>>> (U+03B1) must be treated as a variant of LATIN SMALL LETTER A
>>> (U+0061) because a user might be looking at the combination of
>>> GREEK CAPITAL LETTER ALPHA (U+0391) and LATIN CAPITAL LETTER A
>>> (U+0041) which map (CaseFold) into the lower case pair.  That
>>> sort of relationship exists for a significant number of
>>> Latin-Greek pairs and for a much larger number of Cyrillic-Greek
>>> pairs.  For Cyrillic, it just about doubles the number of
>>> variants in the table.
>>
>> Is this some highly theoretical discussion, or do you actually
>> expect that this would be needed in practice? In my view, it
>> should clearly be treated as the former, but I would have
>> expected you to say so if you thought so.
>>
>> Why do I think so? It is well accepted now that script mixing
>> is a bad idea, exactly because of cases such as the above.
>> So a label consisting of a Latin and a Greek small letter
>> a/alpha just doesn't make much sense to start with.
>>
>> It is also well-known that some carefully choosen letter
>> combinations in one script, in particular in upper case,
>> are difficult or impossible to visually distinguish from
>> potentially completely different letter combinations in
>> other scripts. But these are few and far between, in particular
>> if they are of a certain length and contain some bits of
>> meaning.
>>
>> I would also like to point out that with your approach
>> above, you may not be able to stop at letter pairs. As
>> an example, in script fonts and handwriting, Cyrillic
>> Ts (both upper and lower case) may look similar to Latin
>> Ms, but in print fonts, Cyrillic and Latin Ms look alike.
>> So suddenly, you have to group Cyrillic Ts and Ms with
>> Latin Ms. Not sure anybody will use such a system, at
>> least not for Cyrillic :-(.
>>
>> Regards,    Martin.
>>
>>
>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list