UTC Agenda Item: IDNA proposal

Sun Nov 26 21:31:20 CET 2006

Patrik Fältström wrote:
>> Class "Nd" are numbers, and should be included. Class "Lo" is 
>> required,
>> except:
>>
>> U+0A8D : ઍ, which can be represented as U+0A85 U+0AC5 : અૅ
> This is a big problem.
>
> My position is this:
>
> The IETF is not the organisation that is going to pick individual 
> code points. Either we use the definitions used by the Unicode 
> Consortium (class, script, block, normalization rule, case folding 
> rule), or we ask the Unicode Cosortium to come with something new 
> (for example a new class).
>
> Maybe (and I say maybe) we have different rules for combinations of 
> {class, script, block}.
>
> Please give me such triples, and I try it out in new versions of the 
> table.

I can see why you wouldn't want to get into individual code-point
picking given the size of Unicode and the concerns involved.

I don't see how this is any different to Latin diacritical marks; these
are alternate ways of encoding the same thing.  If this is a hard and
fast rule, then that would imply that the Indic scripts must be deferred
from this revision of the IDNA standard, pending a new Unicode version
with the new data present.

Maybe Mark can comment on what happened, or what the prospects would be
for getting an update.  Mark, http://xrl.us/tgzq is where I list some
specific codepoints that, due to an apparent absence or oversight in the
Unicode database, lead to the NFKC not decomposing pre-composed Indic
letters.  I discuss Gujarati, but probably all of the Indic scripts have
the same issue.
-- 
Sam Vilain, Systems Architect, Catalyst IT (NZ) Ltd.
phone: +64 4 499 2267        PGP ID: 0x66B25843