Really OT: internationalized email addresses (Was: french orthography (Was: BCP47 Appeals process)

John Cowan cowan at
Wed Sep 24 21:03:24 CEST 2008

Mark Crispin scripsit:

> >> The attempt to "internationalize" these tokens will severely damage
> >> their utility as global tokens.  I challenge anyone here to visually
> >> inspect a short text string in Unicode and enter the identical string
> >> on a keyboard.  Nobody, not even the "Unicode experts" can reliably
> >> do that.  In an attempt to work around that, we talk about such things
> >> as "stringprep" and "canonicalization" utterly ignoring the fact that
> >> these are feeble attempts to lock the barn door while the horse it out.
> > True enough: the problem turned out to be bigger than anyone thought.

	"The tactful way," Rod said quietly, "the polite way to disagree
	with the Senator would be to say, 'That turns out not to be
	the case.'"

> The magnitude of the problem was obvious to anyone who
> understood the issues.  

Back in 1988, no one *did* understand the issues, and so Unicode
had to grow by accretion and a fair amount of trial and error.
That's produced a lot of difficulties, some of which have been
patched up, some of which have to be lived with.  There's always
a tradeoff in such situations between stability and correctness.

> Uh, not quite.  Even in the face of Han unification, there remains an
> enormous duplication of Han characters within Unicode, and it's only
> gotten worse with the SIP.

Normalization Form C deals nicely with that particular problem.

> That's what it is all about.  These tokens are written on paper, spoken
> on the telephone, and broadcast on radio and TV.  In all cases, to be
> useful, someone has to enter it.

They are also made up on the fly.

> It's happening already with criminal organizations (spam, phish, etc.)
> and I've had word leaked to me of governments planning such.

No evidence, in other words.

> You may be able to type it, but you can not, given a visual representation
> of the character, reliably enter the correct character for all too many
> characters.  You can't even do that for Latin.

There is no possible solution to that problem: nobody can tell in
isolation, even in Perfect Cleanicode, what stream of characters caused

		ENGLISH TEXT = txet cibara

John Cowan   cowan at
    "Mr. Lane, if you ever wish anything that I can do, all you will have
        to do will be to send me a telegram asking and it will be done."
    "Mr. Hearst, if you ever get a telegram from me asking you to do
        anything, you can put the telegram down as a forgery."

More information about the Ietf-languages mailing list