Mapping other Digits to 0-9
Eric Brunner-Williams
ebw at abenaki.wabanaki.net
Mon Dec 8 17:16:11 CET 2008
The IM rationale -- there exist implementations, of host operating
systems, of language libraries, of applications, ... which map code
points outside the range 0x3x to code points in that range. That
mechanisms exist to provide reverse maps, e.g., using the current locale
to map code points in the range 0x3x to other code points may help
understand the design choices of those implementations, that
non-uniqueness of data may be managed by application context, but the
mechanisms themselves are not available to us.
I've written these. The UTF-8 locale for Solaris 2.x and the libc
routines and bidi tty driver that attempted to get run-time clue as to
the current locale, information available in user-space, via an obscure
early dynamic kernel mechanism, to correctly process character I/O. The
HP-UX 10.x applications for HP-15 and EUC string processing, character
I/O and file I/O. A couple more, but my point is, the decision to map
one or more code points not in the range 0x3x to the range 0x3x was a
architectural choice that data conversion at the system boundary would
have locale data or some other context data equivalent to the data
conversion interior to the system boundary, and that wasn't a globally
correct decision for all very well known and highly documented
applicaitons, the DNS in particular.
To paraphrase Andrew, if we accept that this is a very unusual case, and
therefore that we're willing to put warts on the protocol that we
otherwise aren't willing to add (all the while chanting, "MicroSoft,
MicroSoft, and other idiots who don't do Arabic or Farsi or ... digits
correctly"), then we have a solution, but we've also made it unnecessary
for any errors of assumption in distributed system design to be fixed.
I don't think we need a vendor-specific hack in the protocol, and not
one to protect a bug.
My two beads worth,
Eric
More information about the Idna-update
mailing list