Tamil Numerals in IDNA - Re: WG Last Call for Four Primary IDNABIS I-Ds

Sat Aug 22 03:59:09 CEST 2009

Thanks to everyone who responded to our request.

Let me summarise the current position from our point of view.

1. One of the principal objectives of IDNA is to avoid registration of 
labels which may cause problems, including being visually similar to 
other labels.

2. There are characters in many scripts which are visually similar to a 
character in another script. IDNA2008 does not handle such cases. 
Registries which allow mixed-script labels should take appropriate steps 
to avoid such confusion.

3. The WG has identified that visually similar characters in the *same* 
script should generally be avoided. In most cases, this is achieved by 
applying Unicode properties. Where this is not possible, as with 
ARABIC-INDIC DIGITS and EXTENDED ARABIC-INDIC DIGITS, special rules (A.8 
and A.9) have been introduced in idnabis-tables.txt .

4. Registries - especially gTLDs - cannot be expected to be experts on 
each script they register, but expect the RFCs to provide guidance on 
this matter. It is not reasonable to expect registries to get together 
outside the IETF process and form standard sets of rules for each 
script. I believe that this should be done by IETF.

5. Some Tamil digits are very similar to Tamil letters or syllables (see 
image in my 1st message).

6. Tamil digits are not in contemporary use in India, Sri Lanka or 
elsewhere (I think other WG members can verify this).

7. If Tamil digits were a specific Unicode block, or had an identifiable 
Unicode property,  the WG would be inclined to accede to our request. 
However, the WG is disinclined to disallow characters on a case-by-case 
character analysis [this is my reading of the comments received].

8. Tamil digits have Unicode property "Nd" (decimal numeral) and are in 
the "Tamil" block, and thus cannot be easily differentiated by a rule. 
The only way to treat 0BE6..0BEF as DISALLOWED is to add them to the 
exceptions table one by one. I.e. add them to section "2.6. Exceptions 
(F)" with the explicit value DISALLOWED.

I believe that this case is similar to the ARABIC-INDIC DIGITS, and 
should be treated similarly. However, in this case, the solution is 
simpler, as the characters need only be DISALLOWED and no contextual 
rule is needed.

Regards,

Gihan

P.S. I have not addressed other Southern Indic digits.