Here's what I have to say about that…

Jon Hanna jon at
Mon May 26 13:57:33 CEST 2003

> As for what the RFC is for, if we stop trying to be purist about
> seeing it as representing language tags per se, and instead
> see it as a practical (albeit ad hoc and inconsistent) mechanism
> for creating identifying tags for written language forms
> significantly distinct (and sufficiently prominent) to require
> distinct, reliably machine-readable labels for information
> processing needs, then the case for what Mark has been asking
> to be registered is much easier to make.

To my mind this isn't a matter of purity, it's a matter of using tools appropriately. The use of 3066 as an "ad hoc and inconsistent" mechanism for anything beyond its stated purpose "for the Identification of Languages" is ad hoc and inconsistent, but the fact that it happens tells us two things:

1. The ability to concisely encode locale information in an architecture-neutral and somewhat human-readable way is needed by many applications.
2. RFC 3066 fulfils some of the requirements of such a mechanism, but not all of them.

I think we have consensus on those two points. Where we don't have consensus is on how to proceed.
The majority opinion seems to favour altering the use of 3066 (some point to registrations like de-1901 as evidence that this isn't really altering 3066 at all).
This majority is split on the details of this, in particular with respect to "default" scripts.
The minority (which I *think* seems to be myself with Michael being sympathetic to my position but not in complete agreement either) do not think 3066 can scale to this requirement without *considerable* alteration. To me the "default script" issue seems the result of pushing something further than it can naturally go - there is no resolution because the wrong question is being asked.

I maintain that script is orthogonal to language. What's more I maintain that it is orthogonal *as a practical matter*. The necessity of discovering, expressing and storing script and language is orthogonal, with many applications only caring about one of those, and very different mechanisms can allow us to determine and/or guess at language and at script, making explicit identification of little importance in some cases.

I want the three things to be trivial:

1. Comparing languages,
2. Comparing scripts,
3. Comparing language and script combinations.

I see these three as things people are going to need. Is there any dispute about this?

I further foresee a need for other information about stuff that goes hand-in-hand with the concept of "locale" (a problematic word, but I'll forgo spending 10 paragraphs debating what it means, and for now define it as "a representation of conventions used when rendering data for human consumption, when or parsing human-readable documents").

Someone mentioned the use of 3066 to infer human readable date formats, currencies and currency symbol usage, etc. I think we generally agreed that this wasn't a terribly good idea, but we have to accept that in the absence of other mechanisms people are going to do exactly that, just like people are now using 3066 to infer script information (though at least script is more tightly bound to language).

My back-of-an-envelope strawman is to define a new locale specifier in which the locale of this email is "en-IE.latn".

Further it would allow for expansion to include other locale-dependant information. Hence given my personal preference for using the W3C profile of ISO 8601 then if I state something like the fact that today is 2003-05-26 then something like "en-IE.latn.dt=8601-w3c" could be used to describe this mail.

A system like this (stress on the word "like", this is admittedly back-of-an-envelope stuff):
1. contains all the information that the people who want to add script information to 3066 require.
2. While not backward-compatible with 3066, 3066 is forwards compatible with it.
3. Pre-empts further requests to extend the scope of 3066, and provides a way in which such extension wouldn't hurt previous implementations.
4. Allows for the separation of responsibility - the management of language tags would not necessarily be done by the same people as the management of script tags, or any other features added by the extensibility mechanism provided. This is of importance both as a matter of scalability and also because some people might simply only find some of those matters to be interesting.
5. Can co-exist with 3066 being used "for the Identification of Languages".

More information about the Ietf-languages mailing list