Alternate character sets (was: Re: confusing labels)
xlegoff at gmail.com
Mon Apr 13 14:03:38 CEST 2009
Dear Mr. Klensin,
I do not want to engage into a diatribe. I just wished to underline
the difficulty to confuse internationalization (which may be a network
application service - layer 7) and localization of the used names
which is a user application service - outside of the OSI model.
The real issue now is mails like this Louis Pouzin's one on Civil
Society Governance Dynamic Coalition:
"Chers amis, dear friends,
Voir ce qu'un célèbre linguiste indien dit sur les fantaisies d'unicode:
See what a top indian linguist says about unicode vagaries:
Voici la liste des administrateurs du consortium unicode. Vous avez compris ?
Here is a list of the unicode consortium board members. You get the idea ?
• Harald Tveit Alvestrand - GOOGLE (USA)
• Julie Bennett - MICROSOFT (USA)
• Carl Hoffman - Basis Technology (USA)
• Tatsuo L. Kobayashi - JustSystems Corp. (JAPAN)
• Marypat Meuli - MICROSOFT (USA)
• David Richards - OCLC (USA)
• Bill Sullivan - IBM (USA)
• Celia Vigil - APPLE (USA)
• Mark Davis , president - GOOGLE (USA)
• Mike Kernaghan , VP & Treasurer - MICROSOFT (USA)
• plus a few more administrative staff"
This may complexify your task ahead.
2009/4/13 John C Klensin <klensin at jck.com>:
> --On Monday, April 13, 2009 05:29 +0200 Xavier Legoff
> <xlegoff at gmail.com> wrote:
>> Dear Mr. Klensin,
>> Another input I find interesting from Don Osborn, calling for
>> organised versatility in headers and algorithms and to foresee
>> transition and parallel solutions.
> M. Legoff,
> I am working on a more comprehensive note to you and your
> colleagues. As both a matter of courtesy and to reduce the
> chance of further misunderstanding, I will send it only when a
> French translation has been prepared and verified.
> However, in the hope of quickly giving you at least the outline
> of a response...
> The data in your message is very interesting. However, it is
> not a surprise and it has little or nothing to do with the work
> of this working group. I do have some data on another
> international broadcaster and I know that they try to find out
> which coded character set (CCS) is most in use by the target
> population and then they use that CCS. So, again, I am not
> surprised by what the BBC is doing.
> First of all, I hope you understand already that Internet
> protocols that deal with actual content -- words, sentences,
> paragraphs, and so on -- generally have provisions for
> identifying both the language in which the material is written
> and the character set used to encode it. That is true, in
> particular, for both email and the web which can support the use
> of any well-defined coded character set or language. That is,
> of course, why the BBC can use those systems on its web pages
> and other distributions.
> The domain name system does not share that property. There are
> a long list of reasons why it cannot accommodate more than one
> character coding system and cannot be language-sensitive. In
> practical terms, it is not even clear that a different design
> could have done better as long as many domain names are
> abbreviations, acronyms, or numbered objects rather than words
> in any language: the user who sees a domain name without
> specific context has no way to know what language was intended.
> Because of this, IDNs were basically impossible before Unicode
> and UTF-8 is the only plausible encoding form for them.
> That other note will discuss what can reasonably be done about
> the situation, but the work that is required is well outside the
> scope of this WG.
More information about the Idna-update