Mapping and Variants

Tina Dam tina.dam at icann.org
Mon Mar 9 22:35:56 CET 2009


Forgot to mention one thing:

It's not just script mixing (which is now prohibited) that can cause problems. Imagine a zone for which both Cyrillic and Latin and Greek is supported, although mixing the scripts within one label does not take place. There can still be problems - for example, who know which script aaa.tld belongs too. I guess it does only matter if there is more than one of those domains registered - which there off course should not be. 

This is being build into the requirements for IDN tables, and it can also possible the Guidelines, to provide some more details to operators that are implementing this.

> -----Original Message-----
> From: idna-update-bounces at alvestrand.no [mailto:idna-update-
> bounces at alvestrand.no] On Behalf Of Tina Dam
> Sent: Monday, March 09, 2009 2:31 PM
> To: Martin Duerst; Vint Cerf
> Cc: idna-update at alvestrand.no; John C Klensin
> Subject: RE: Mapping and Variants
> 
> Hi everybody, sorry for catching up late on this thread. I was a bit
> occupied last week at the ICANN Mexico meeting.
> 
> The IDN Guidelines correctly states that mixing of scripts is not
> allowed at registration time unless there is a linguistic reason for
> doing so (such as in the case of Japanese). It should further be noted
> that while the IDN Guidelines are not a requirement for ccTLD operators
> today (but only for gTLD operators) - it will, looking forward be a
> requirement for any new TLDs allocated in either the new gTLD Program
> or the IDN ccTLD Fast Track Program.
> 
> In these two processes it is also a requirement for applicants to
> develop and submit their IDN Tables (this is ICANN terminology for
> where variants are identified). While TLD operators can choose to
> block, bundle, reserve or otherwise deal with variant strings at
> registration time, ICANN have proposed one method forward when it comes
> to the top-level, that is:
> 
> - strings will be allocated if they have the same meaning as the
> applied-for string (for example a country name) and otherwise fulfill
> all the string requirements.
> - all other variant -strings will be blocked for allocation.
> 
> 
> I am wondering if there is anything in these processes or requirements
> that could be made stronger from a technical standpoint and help the
> IDNA process forward?
> 
> Tina
> 
> > -----Original Message-----
> > From: idna-update-bounces at alvestrand.no [mailto:idna-update-
> > bounces at alvestrand.no] On Behalf Of Martin Duerst
> > Sent: Saturday, March 07, 2009 4:44 AM
> > To: Vint Cerf
> > Cc: John C Klensin; idna-update at alvestrand.no
> > Subject: Re: Mapping and Variants
> >
> > Hello Vint,
> >
> > I agree that prohibiting script mixing should be the default
> > for any registry, apart from a few exceptions such as Japanese.
> > That's my main argument for why I think what John was talking
> > about in his mail is highly (if not completely) theoretical.
> >
> > Regards,    Martin.
> >
> > At 20:11 09/03/07, Vint Cerf wrote:
> > >Martin,
> > >
> > >unless prohibited either at registration time or by protocol,
> > >it is likely that any bad cases will be exercised by people
> > >looking to fool others into doing the wrong thing with domain
> > >names. So I guess I would lean towards finding ways to
> > >confine permitted behaviors to those less likely to be
> > >troublesome. I would include in "bad cases" script
> > >mixing, even though it might have some exotic appeal
> > >for some cases that aren't intentionally "bad".
> > >
> > >I hope that makes sense.
> > >
> > >v
> > >
> > >
> > >Vint Cerf
> > >Google
> > >1818 Library Street, Suite 400
> > >Reston, VA 20190
> > >202-370-5637
> > >vint at google.com
> > >
> > >
> > >
> > >
> > >On Mar 7, 2009, at 3:31 AM, Martin Duerst wrote:
> > >
> > >> At 06:06 09/03/06, John C Klensin wrote:
> > >>
> > >>> When IDNA2003 was written, no one (as far as I know) anticipated
> > >>> the need to create elaborate variant (bundling) systems to
> > >>> associate potentially-confusing labels within a zone so that
> > >>> they could be given special treatment.
> > >>
> > >> Maybe the exact details weren't anticipated, but lots of
> > >> discussion surrounding the issues definitely went on way
> > >> before IDNA2003 was final. Whether we called it 'bundling'
> > >> or whatever else, I'm pretty sure people such as Ken and
> > >> me who were sceptical (and, as it turned out, right) on a
> > >> central, uniform solution for CJK simplified/traditional
> > >> mappings were mentioning solutions in this direction.
> > >>
> > >>
> > >>> For scripts with case differences, IDNA2003 also chose to
> > >>> concentrate on lower case, partially because there was better
> > >>> differentiation of those characters.  It has often been
> > >>> observed, for example, that Greek lower case ("SMALL LETTER")
> > >>> alpha and beta don't look nearly enough like their Latin
> > >>> counterparts ("a" and "b") to be confusing to anyone, but that
> > >>> the capital character pairs are identical.
> > >>>
> > >>> Unfortunately, if one has a situation in which Greek and Latin
> > >>> scripts are considered today and chooses to use variants _and_
> > >>> has the expectation of case-mapping, GREEK SMALL LETTER ALPHA
> > >>> (U+03B1) must be treated as a variant of LATIN SMALL LETTER A
> > >>> (U+0061) because a user might be looking at the combination of
> > >>> GREEK CAPITAL LETTER ALPHA (U+0391) and LATIN CAPITAL LETTER A
> > >>> (U+0041) which map (CaseFold) into the lower case pair.  That
> > >>> sort of relationship exists for a significant number of
> > >>> Latin-Greek pairs and for a much larger number of Cyrillic-Greek
> > >>> pairs.  For Cyrillic, it just about doubles the number of
> > >>> variants in the table.
> > >>
> > >> Is this some highly theoretical discussion, or do you actually
> > >> expect that this would be needed in practice? In my view, it
> > >> should clearly be treated as the former, but I would have
> > >> expected you to say so if you thought so.
> > >>
> > >> Why do I think so? It is well accepted now that script mixing
> > >> is a bad idea, exactly because of cases such as the above.
> > >> So a label consisting of a Latin and a Greek small letter
> > >> a/alpha just doesn't make much sense to start with.
> > >>
> > >> It is also well-known that some carefully choosen letter
> > >> combinations in one script, in particular in upper case,
> > >> are difficult or impossible to visually distinguish from
> > >> potentially completely different letter combinations in
> > >> other scripts. But these are few and far between, in particular
> > >> if they are of a certain length and contain some bits of
> > >> meaning.
> > >>
> > >> I would also like to point out that with your approach
> > >> above, you may not be able to stop at letter pairs. As
> > >> an example, in script fonts and handwriting, Cyrillic
> > >> Ts (both upper and lower case) may look similar to Latin
> > >> Ms, but in print fonts, Cyrillic and Latin Ms look alike.
> > >> So suddenly, you have to group Cyrillic Ts and Ms with
> > >> Latin Ms. Not sure anybody will use such a system, at
> > >> least not for Cyrillic :-(.
> > >>
> > >> Regards,    Martin.
> > >>
> > >>
> > >> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin
> University
> > >> #-#-#  http://www.sw.it.aoyama.ac.jp
> > mailto:duerst at it.aoyama.ac.jp
> > >>
> > >> _______________________________________________
> > >> Idna-update mailing list
> > >> Idna-update at alvestrand.no
> > >> http://www.alvestrand.no/mailman/listinfo/idna-update
> > >
> > >_______________________________________________
> > >Idna-update mailing list
> > >Idna-update at alvestrand.no
> > >http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> >
> > #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> > #-#-#  http://www.sw.it.aoyama.ac.jp
> > mailto:duerst at it.aoyama.ac.jp
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list