Mapping other Digits to 0-9

Mon Dec 8 17:16:11 CET 2008

The IM rationale -- there exist implementations, of host operating 
systems, of language libraries, of applications, ... which map code 
points outside the range 0x3x to code points in that range. That 
mechanisms exist to provide reverse maps, e.g., using the current locale 
to map code points in the range 0x3x to other code points may help 
understand the design choices of those implementations, that 
non-uniqueness of data may be managed by application context, but the 
mechanisms themselves are not available to us.

I've written these. The UTF-8 locale for Solaris 2.x and the libc 
routines and bidi tty driver that attempted to get run-time clue as to 
the current locale, information available in user-space, via an obscure 
early dynamic kernel mechanism, to correctly process character I/O. The 
HP-UX 10.x applications for HP-15 and EUC string processing, character 
I/O and file I/O. A couple more, but my point is, the decision to map 
one or more code points not in the range 0x3x to the range 0x3x was a 
architectural choice that data conversion at the system boundary would 
have locale data or some other context data equivalent to the data 
conversion interior to the system boundary, and that wasn't a globally 
correct decision for all very well known and highly documented 
applicaitons, the DNS in particular.

To paraphrase Andrew, if we accept that this is a very unusual case, and 
therefore that we're willing to put warts on the protocol that we 
otherwise aren't willing to add (all the while chanting, "MicroSoft, 
MicroSoft, and other idiots who don't do Arabic or Farsi or ... digits 
correctly"), then we have a solution, but we've also made it unnecessary 
for any errors of assumption in distributed system design to be fixed.

I don't think we need a vendor-specific hack in the protocol, and not 
one to protect a bug.

My two beads worth,
Eric