Follow-up to Monday's discussion of digits

Thu Nov 20 01:50:53 CET 2008

Andrew,

I think your general conclusion mirrors at least the general  
consensus in the meeting. The many other digit representations don't  
seem to have the same complex problem that the two sets of indic and  
extended-indic have: namely that the several languages using various  
portions of the arabic letters and numerals have a significant  
potential for confusion if mixing is permitted while virtually all  
the other scripts and numerics are pretty much 1:1 with regard to  
which numerics go with which scripts. As a consequence and to avoid  
combinatorial explosions in the context rules, it has been proposed  
to defend against mixing explicitly in the case of these two numeric  
sets and the conventional numbers used by Latin-based character  
sets.  We await the resolution of the degree of restriction necessary  
on the basis of consultations to be undertaken by Erik and John.

vint

NOTE NEW BUSINESS ADDRESS AND PHONE
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com

On Nov 19, 2008, at 5:51 PM, Andrew Sullivan wrote:

> On Wed, Nov 19, 2008 at 01:57:20PM +0900, Martin Duerst wrote:
>> I do see nothing in the current documents (nor in IDNA2003, for
>> that matter) that would forbid registries to have a policy to
>> restrict registrations in a single label to any one series
>> of digits, or to some desirable (and hopefully non-confusable
>> and therefore non-exploding) combination of series of digits.
>
> I agree with this, and think that the restriction _could_ be done only
> by regstries.  That said, the example case may be unusual enough that
> it is worth pushing into the protocol.
>
> I'll probably get the terminology wrong in what follows, but I my
> current understanding is that the various ranges of digits always
> contain the complete set of digits, even if the digits are really
> shared.  In other words I think that the extended and non-extended
> Arabic-Indic ranges in some sense contain three characters that would
> have been the same code point had they not been digits.  It's a good
> thing that the code points for digits are always in a contiguous
> range, but it has created this unusual case that happens to be bad for
> domain name label use.  Is that correct?  Because if I understand this
> correctly, it's sufficiently unlike other cases that treating it
> specially in the protocol might be the right trade-off.  This "strange
> case", after all, is part of what was the motivation behind having
> context rules in the first place, no?
>
> A
>
> -- 
> Andrew Sullivan
> ajs at shinkuro.com
> Shinkuro, Inc.
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081119/e83cfafb/attachment.htm