Tamil Numerals in IDNA - Re: WG Last Call for Four Primary IDNABIS I-Ds
Gihan Dias
gihan at cse.mrt.ac.lk
Sat Aug 22 03:59:09 CEST 2009
Thanks to everyone who responded to our request.
Let me summarise the current position from our point of view.
1. One of the principal objectives of IDNA is to avoid registration of
labels which may cause problems, including being visually similar to
other labels.
2. There are characters in many scripts which are visually similar to a
character in another script. IDNA2008 does not handle such cases.
Registries which allow mixed-script labels should take appropriate steps
to avoid such confusion.
3. The WG has identified that visually similar characters in the *same*
script should generally be avoided. In most cases, this is achieved by
applying Unicode properties. Where this is not possible, as with
ARABIC-INDIC DIGITS and EXTENDED ARABIC-INDIC DIGITS, special rules (A.8
and A.9) have been introduced in idnabis-tables.txt .
4. Registries - especially gTLDs - cannot be expected to be experts on
each script they register, but expect the RFCs to provide guidance on
this matter. It is not reasonable to expect registries to get together
outside the IETF process and form standard sets of rules for each
script. I believe that this should be done by IETF.
5. Some Tamil digits are very similar to Tamil letters or syllables (see
image in my 1st message).
6. Tamil digits are not in contemporary use in India, Sri Lanka or
elsewhere (I think other WG members can verify this).
7. If Tamil digits were a specific Unicode block, or had an identifiable
Unicode property, the WG would be inclined to accede to our request.
However, the WG is disinclined to disallow characters on a case-by-case
character analysis [this is my reading of the comments received].
8. Tamil digits have Unicode property "Nd" (decimal numeral) and are in
the "Tamil" block, and thus cannot be easily differentiated by a rule.
The only way to treat 0BE6..0BEF as DISALLOWED is to add them to the
exceptions table one by one. I.e. add them to section "2.6. Exceptions
(F)" with the explicit value DISALLOWED.
I believe that this case is similar to the ARABIC-INDIC DIGITS, and
should be treated similarly. However, in this case, the solution is
simpler, as the characters need only be DISALLOWED and no contextual
rule is needed.
Regards,
Gihan
P.S. I have not addressed other Southern Indic digits.
More information about the Idna-update
mailing list