Using - or _ in language tags and/or locales

A. Vine
Fri, 15 Feb 2002 10:24:05 -0800

I still think that we should be working more towards separating locales and
languages, rather than combining them.  What if you've got an Austrian text but
you're in Germany?  Your locale is de_DE (although it might make more sense to
make the locale something like DE_BY for Bavaria in Germany) but your lang is
de-AT, or de-AT-1996 or something similar.  That might mean that although the
text has Austrian specific words and phrases, you still prefer your date formats
and such as Germany.

That is a stretched example.  It becomes far more obvious in other situations,
like the US for example.  If we start blurring the distinction between a
language and a locale, the number of combinations in the tag becomes
astronomical.  Instead of having a separate language tag as "ja" with a locale
tag of en_US, you'd have to understand "ja_US".  There is currently no
definition for a tag of "ja_US", nor is there one for "es_US" or even "en_JP"
for that matter.  Locales and langs require data, and that data has to be
accessed via an exact-match tag.  Keeping the 2 separate helps to keep that
possible.  Blurring them is a disservice to the potential for customizing the
user experience.  For Web content management, for example, you want to be
drilling down even further, "" (meaning US, California, Santa Clara
county) or whatever the syntax may eventually turn out to be.

If all the info you have is a language tag, then it's worth using it to guess
the locale.  But we should be working towards allowing the separation to provide
the additional information:

time zone

At times, these things are related, and at times they're not.

Is the equation of - and _ definitely going into ISO 15897?  I'd be very
interested to talk to the folks as to how this is supposed to work when we
divide the 2 functions.

(really, this isn't all I think about, it just sounds this way on this list.)

Keld Jørn Simonsen wrote:
> On Fri, Feb 15, 2002 at 09:25:07AM +0000, John Clews wrote:
> > Using - or _ in language tags and/or locales
> > [was Re: xx-XX-nnnn vs. xx-nnnn in Chinese and German]
> >
> > In message <p05101001b890a2063ec2@[]> Michael Everson wrote:
> >
> > > The spelling of January in date formats is a locale issue not a
> > > language tagging issue. You can already use de_AT and de_DE for that,
> > > and indeed that is recommended. Are there other differences common
> > > enough to warrant a language tag?
> > > de-AT and de-DE are legal according to the RFC anyway, as are en-GB
> > > and en-US and en-IE.
> >
> > Using _ in "locale tags" and using - in language tags provides a very
> > simple visual distinction between "locale tags" and language tags.
> >
> > However, from memory (and perhaps Keld can clear this up) I recall
> > that some specification being developed under the auspices of
> > JTC1/SC22/WG20 (was it ISO/IEC DTR 14652? I can't remember) proposed
> > that "locale tags" could use - as an alternative to using _ in locales.
> Yes it was proposed for the current draft of ISO 15897 that
> - and _ be considered the same.
> > To me this blurs the distinction between locales and language tags
> > (two different things, though with byte strings in common).
> a language does suggest a specific culture, so there is a clear
> relation. In practice on POSIX systems this tends to be the same,
> and I gather it is the same for other systems..
> Kind regards
> keld
> _______________________________________________
> Ietf-languages mailing list