Really OT: internationalized email addresses (Was: french orthography (Was: BCP47 Appeals process)

Mark Crispin markrcrispin at
Wed Sep 24 18:09:46 CEST 2008

This is very off-topic, but you have to understand why some people are so opposed to "internationalized" email addresses and "internationalized" domains.

> I would agree that more people would recognize Latin than any other script.

Which means them ideal for machine tokens such as identifiers.

Let me be clear on this: DNS names, email addresses, etc. are machine tokens and NOT natural language.  They are properly seen as the equivalent of telephone numbers.

The attempt to "internationalize" these tokens will severely damage their utility as global tokens.  I challenge anyone here to visually inspect a short text string in Unicode and enter the identical string on a keyboard.  Nobody, not even the "Unicode experts" can reliably do that.  In an attempt to work around that, we talk about such things as "stringprep" and "canonicalization" utterly ignoring the fact that these are feeble attempts to lock the barn door while the horse it out.

> However, that doesn't make it easy for them to use, or that they would "know how to write their name".

Actually, most people can handle Latin character input from a keyboard if they can use a keyboard at all

Most people can produce some rendition of their name in Latin script.  It doesn't matter that the rendition may vary, or that it is technically inaccurate; it simply exists.

> You probably are familiar with Greek letters, yet try to spell out English words with only Greek letters -- the fit isn't very good, and won't be unique. And Greek is an easy case. Part of the problem is that there is often no simple mapping between a script and ASCII letters; take Arabic for example, with many more consonants.

That's not important to my argument.  We're not talking about good or unique fit or even accurate fit.

An American who is literate in Greek can come up with some form of his name in Greek.  Most literate people in the world are taught Latin script in school, and one of the things they are taught is some rendition of their name in Latin script.  It doesn't matter that the kids in school A do it this way, and the kids in school B do it another way.

More importantly, it doesn't matter to the overall argument.  A person, upon receipt of a telephone number, can enter that number on any telephone in the world, modulo local variations in actual calling (e.g., the need to press a SEND button).

Similarly, a person, upon receipt of a printed email address or DNS name, can enter that string on any keyboard in the world that has Latin characters on the keytops (and most non-Latin keyboards do have Latin characters on the keytops as well).  "Internationalized email addresses" and "internationalized domains" break this property, all to satisfy rabid nationalists who think that by doing this they are correcting an "English bias".

There is a far more sinister agenda at work; to make it impossible for these tokens to be used outside the country.  There will be the "haves", who have both their "internationalized" email address and a global email address using Latin script, and the "have nots" who have only an "internationalized" (translation: domestic only) email address that nobody outside can access.  Don't think for a moment that the stringprep and canonicalization kludges will actually be obeyed.

And Unicode is happily enabling that agenda.

-- Mark --

Get more out of the Web. Learn 10 hidden secrets of Windows Live.!550F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008

More information about the Ietf-languages mailing list