Really OT: internationalized email addresses (Was: french orthography (Was: BCP47 Appeals process)
John Cowan
cowan at ccil.org
Wed Sep 24 21:03:24 CEST 2008
Mark Crispin scripsit:
> >> The attempt to "internationalize" these tokens will severely damage
> >> their utility as global tokens. I challenge anyone here to visually
> >> inspect a short text string in Unicode and enter the identical string
> >> on a keyboard. Nobody, not even the "Unicode experts" can reliably
> >> do that. In an attempt to work around that, we talk about such things
> >> as "stringprep" and "canonicalization" utterly ignoring the fact that
> >> these are feeble attempts to lock the barn door while the horse it out.
> > True enough: the problem turned out to be bigger than anyone thought.
>
> WRONG!
"The tactful way," Rod said quietly, "the polite way to disagree
with the Senator would be to say, 'That turns out not to be
the case.'"
> The magnitude of the problem was obvious to anyone who
> understood the issues.
Back in 1988, no one *did* understand the issues, and so Unicode
had to grow by accretion and a fair amount of trial and error.
That's produced a lot of difficulties, some of which have been
patched up, some of which have to be lived with. There's always
a tradeoff in such situations between stability and correctness.
> Uh, not quite. Even in the face of Han unification, there remains an
> enormous duplication of Han characters within Unicode, and it's only
> gotten worse with the SIP.
Normalization Form C deals nicely with that particular problem.
> That's what it is all about. These tokens are written on paper, spoken
> on the telephone, and broadcast on radio and TV. In all cases, to be
> useful, someone has to enter it.
They are also made up on the fly.
> It's happening already with criminal organizations (spam, phish, etc.)
> and I've had word leaked to me of governments planning such.
No evidence, in other words.
> You may be able to type it, but you can not, given a visual representation
> of the character, reliably enter the correct character for all too many
> characters. You can't even do that for Latin.
There is no possible solution to that problem: nobody can tell in
isolation, even in Perfect Cleanicode, what stream of characters caused
this:
ENGLISH TEXT = txet cibara
--
John Cowan cowan at ccil.org
"Mr. Lane, if you ever wish anything that I can do, all you will have
to do will be to send me a telegram asking and it will be done."
"Mr. Hearst, if you ever get a telegram from me asking you to do
anything, you can put the telegram down as a forgery."
More information about the Ietf-languages
mailing list