Protocol-08 (and status of Defs-04 and Rationale-06)
Harald Tveit Alvestrand
harald at alvestrand.no
Mon Dec 8 08:32:36 CET 2008
Mark Davis skrev:
> Just to make sure we are all on the same page, the digits in question
> are the following.
>
> (European) U+0030 DIGIT ZERO...
> <http://demo.icu-project.org/icu-bin/ubrowse?ch=0030#here>
> 0 1 2 3 4 5 6 7 8 9
>
>
> Arabic-Indic digits: U+0660 ARABIC-INDIC DIGIT ZERO...
> <http://demo.icu-project.org/icu-bin/ubrowse?ch=0660#here>
> ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩
>
>
> Extended Arabic-Indic digits: U+06F0 EXTENDED ARABIC-INDIC DIGIT
> ZERO... <http://demo.icu-project.org/icu-bin/ubrowse?ch=06F0#here>
> ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹
>
I note that my system seems to have a wrong font for at least one of
these, since the 4-5-6 numbers look very similar between these two
examples; when I check with the Unicode book's examples, it seems that
my glyphs for U+0660... are wrong.
>
> Various Indic digits: U+0966 DEVANAGARI DIGIT ZERO...
> <http://demo.icu-project.org/icu-bin/ubrowse?k1=0966#here> (Devanagari,
> Gujarati, Tamil, ...)
> ० १ २ ३ ४ ५ ६ ७ ८ ९
>
>
> Unfortunately, Europeans often refer to European digits as "Arabic"
> (indicating their source), and Arabs often refer to Arabic digits as
> "Indic" (indicating their source). For that reason, the Unicode names
> for the Arabic digits use "Arabic-Indic" instead of simply "Arabic".
> Importantly, the Arabic digits should not be confused with true Indic
> digits, like those for Devanagari, etc. So use of the above Unicode
> names is recommended to avoid confusion.
>
> So I believe what was meant was the following. (I added one more
> possible option, #4, and added examples, pairing 'a' with European 9,
> 'b' with Arabic-Indic 9, and 'c' with Extended Arabic-Indic 9. Thus
> "b٩" below is using the U+0669 ARABIC-INDIC DIGIT NINE
> <http://demo.icu-project.org/icu-bin/ubrowse?k1=0669#here> while "c۹"
> is using the U+06F9 EXTENDED ARABIC-INDIC DIGIT NINE
> <http://demo.icu-project.org/icu-bin/ubrowse?k1=06F9#here>.)
>
> 1. forbid at protocol level using context rules the mixing of
> Arabic-Indic, Extended-Arabic-Indic and European digits in any
> combination.
> * forbid {a9b٩, a9c۹, b٩c۹}
> 2. forbid at protocol level using context rules the mixing of
> Arabic-Indic with European and separately forbid mixing
> Extended-Arabic-Indic with European (but allow
> mixing Arabic-Indic and Extended-Arabic-Indic).
> * forbid {a9b٩, a9c۹}, but allow {b٩c۹}
> 3. do not forbid use of digits at protocol level but use registry
> filters implemented by each registry.
> 4. forbid at protocol level using context rules the mixing of
> Arabic-Indic with Extended-Arabic-Indic (but allow the mixing of
> either one alone with European digits).
> * forbid {b٩c۹}, but allow {a9b٩, a9c۹}
>
Agreed with the listing of alternatives. My favourites are #1 and #3, in
that order.
I have not seen anyone arguing for #4 (I have seen it proposed, but
interpreted it as a strawman).
Harald
More information about the Idna-update
mailing list