UTC Agenda Item: IDNA proposal

Sun Nov 26 21:33:48 CET 2006

Harald Alvestrand wrote:
>> In fact, it looks like U+0A86 : આ is actually U+0A85 U+0ABE : અા, I
>> guess there needs to be a Stringprep-like normalisation step for these.
>> So, maybe U+0A86 is not needed. - eg U+0A94 : ઔ could be U+0A85 U+0ABE
>> U+0AC8 : અાૈ. This is not a perfect homograph with the Padmaa font,
>> but it is on the Unicode.org code chart.
>>     
>
> would it be harmful to include those, apart from the confusables problem?
>   

What else do I need to be aware of, other than the confusables issue?

> Or do you think that they "should have had" canonical/compatibility 
> decompositions, so that they would go away under the NFKC rule?
>   

This looks to be the case.  But, as Patrik mentioned on another strand
of this thread, it's not the IETF's job to set Unicode policy.

>> Again, U+0AD0 : ૐ is a Sanskrit symbol and its duplication at U+0950 :
>> ॐ is regrettable. Probably the Devanagari version should "win".
>>     
> by "win", do you mean that there should be a canonical decomposition of 
> U+0AD0 to U+0950?
>   

Yes, precisely.
--
Sam Vilain, Systems Architect, Catalyst IT (NZ) Ltd.
phone: +64 4 499 2267 PGP ID: 0x66B25843