Mapping and Variants

Thu Mar 5 23:34:14 CET 2009

Hi,

In Arabic-script the confusion between characters depends on the 
location of those characters in series of characters and joiner (word) 
and not its case.
If anyone interested in the current bundling policy for Persian domains 
look at : http://www.nic.ir/idn.

Alireza

John C Klensin wrote:
> Hi.
>
> In the process of working on a document that makes
> recommendations about Cyrillic registrations (paralleling RFC
> 4713 for Chinese and the I-D for Arabic language registrations),
> it was forcefully brought to my attention that there is a
> tradeoff, and perhaps an actual conflict, between mapping and
> JET-like variant approaches.  I hope the draft document on
> Cyrillic will be posted by Monday, but it is not within the WG's
> scope and I think I can explain the issue without it.
>
> When IDNA2003 was written, no one (as far as I know) anticipated
> the need to create elaborate variant (bundling) systems to
> associate potentially-confusing labels within a zone so that
> they could be given special treatment.   Since the publication
> of RFC 3743 (the "JET Guidelines"), the practice has become more
> or less widespread, even though more in discussions than in
> implementations.  In the current WG's discussions we have
> included references to variants or bundling as an important
> possibility in many of our discussions about confusing character
> combinations as well as transitional strategies.  We have also
> discussed, although not necessarily agreed upon, the issue of
> variant explosion in which having multiple variants for even a
> small number of characters potentially causes more variant
> labels that a zone might be plausibly able to handle (some of
> the CJK registries and others deal with that by banning some
> variant combinations outright rather than allowing for bundling
> them into a zone).
>
> For scripts with case differences, IDNA2003 also chose to
> concentrate on lower case, partially because there was better
> differentiation of those characters.  It has often been
> observed, for example, that Greek lower case ("SMALL LETTER")
> alpha and beta don't look nearly enough like their Latin
> counterparts ("a" and "b") to be confusing to anyone, but that
> the capital character pairs are identical.
>
> Unfortunately, if one has a situation in which Greek and Latin
> scripts are considered today and chooses to use variants _and_
> has the expectation of case-mapping, GREEK SMALL LETTER ALPHA
> (U+03B1) must be treated as a variant of LATIN SMALL LETTER A
> (U+0061) because a user might be looking at the combination of
> GREEK CAPITAL LETTER ALPHA (U+0391) and LATIN CAPITAL LETTER A
> (U+0041) which map (CaseFold) into the lower case pair.  That
> sort of relationship exists for a significant number of
> Latin-Greek pairs and for a much larger number of Cyrillic-Greek
> pairs.  For Cyrillic, it just about doubles the number of
> variants in the table.
>
> Not a good situation.  But it is one that I think we need to
> consider as we weigh the various tradeoffs associated with
> mapping, even for transitional purposes, since variant methods
> are at least as much part of our landscape today as creative
> interpretations of  the specs in web page design.
>
>      john
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>