Protocol-08 (and status of Defs-04 and Rationale-06)

Mon Dec 8 08:32:36 CET 2008

Mark Davis skrev:
> Just to make sure we are all on the same page, the digits in question 
> are the following. 
>
> (European) U+0030 DIGIT ZERO... 
> <http://demo.icu-project.org/icu-bin/ubrowse?ch=0030#here>
> 0  	 1  	 2  	 3  	 4  	 5  	 6  	 7  	 8  	 9 
>
>
> Arabic-Indic digits: U+0660 ARABIC-INDIC DIGIT ZERO... 
> <http://demo.icu-project.org/icu-bin/ubrowse?ch=0660#here>
>  ٠  	 ١  	 ٢  	 ٣  	 ٤  	 ٥  	 ٦  	 ٧  	 ٨  	 ٩ 
>
>
> Extended Arabic-Indic digits: U+06F0 EXTENDED ARABIC-INDIC DIGIT 
> ZERO... <http://demo.icu-project.org/icu-bin/ubrowse?ch=06F0#here>
>  ۰  	 ۱  	 ۲  	 ۳  	 ۴  	 ۵  	 ۶  	 ۷  	 ۸  	 ۹ 
>
I note that my system seems to have a wrong font for at least one of 
these, since the 4-5-6 numbers look very similar between these two 
examples; when I check with the Unicode book's examples, it seems that 
my glyphs for U+0660... are wrong.
>
> Various Indic digits: U+0966 DEVANAGARI DIGIT ZERO... 
> <http://demo.icu-project.org/icu-bin/ubrowse?k1=0966#here> (Devanagari, 
> Gujarati, Tamil, ...)
> ०  	 १  	 २  	 ३  	 ४  	 ५  	 ६  	 ७  	 ८  	 ९ 
>
>
> Unfortunately, Europeans often refer to European digits as "Arabic" 
> (indicating their source), and Arabs often refer to Arabic digits as 
> "Indic" (indicating their source). For that reason, the Unicode names 
> for the Arabic digits use "Arabic-Indic" instead of simply "Arabic". 
> Importantly, the Arabic digits should not be confused with true Indic 
> digits, like those for Devanagari, etc. So use of the above Unicode 
> names is recommended to avoid confusion.
>
> So I believe what was meant was the following. (I added one more 
> possible option, #4, and added examples, pairing 'a' with European 9, 
> 'b' with Arabic-Indic 9, and 'c' with Extended Arabic-Indic 9. Thus 
> "b٩" below is using the U+0669 ARABIC-INDIC DIGIT NINE 
> <http://demo.icu-project.org/icu-bin/ubrowse?k1=0669#here> while "c۹" 
> is using the U+06F9 EXTENDED ARABIC-INDIC DIGIT NINE 
> <http://demo.icu-project.org/icu-bin/ubrowse?k1=06F9#here>.)
>
>    1. forbid at protocol level using context rules the mixing of
>       Arabic-Indic, Extended-Arabic-Indic and European digits in any
>       combination.
>           * forbid {a9b٩, a9c۹, b٩c۹}
>    2. forbid at protocol level using context rules the mixing of
>       Arabic-Indic with European and separately forbid mixing
>       Extended-Arabic-Indic with European (but allow
>       mixing Arabic-Indic and Extended-Arabic-Indic).
>           * forbid {a9b٩, a9c۹}, but allow {b٩c۹}
>    3. do not forbid use of digits at protocol level but use registry
>       filters implemented by each registry.
>    4. forbid at protocol level using context rules the mixing of
>       Arabic-Indic with Extended-Arabic-Indic (but allow the mixing of
>       either one alone with European digits).
>           * forbid {b٩c۹}, but allow {a9b٩, a9c۹}
>
Agreed with the listing of alternatives. My favourites are #1 and #3, in 
that order.

I have not seen anyone arguing for #4 (I have seen it proposed, but 
interpreted it as a strawman).

                            Harald