What a Locale is.... (Re: [Fwd]: Response to Mark's message])

Jon Hanna jon at spin.ie
Tue Apr 15 13:12:39 CEST 2003

> I think this provides a basis on an answer for Jon Hanna's question: why
> combine script IDs into the middle of language tags.
Given that there are
> existing implementations of language tags in which a single
> tagging system
> is used for both non-textual and textual distinctions, I think the onus
> should be on those who think we should have distinct systems to
> make their
> case.

A few points for our case.

1. If you can reasonably avoid changing a current standard then leave it as
it is.

2. The separation between script and language is very "real".
This is true both in terms of how humans and programs view such data.

a. There are systems that only care about one of these two facets (language
or script). We can also have systems that change one while retaining the
other. Even with systems that deal with both there is likely to be much
internal that only cares about one of the two. Reflecting this change is
likely to benefit both human and electronic users.

b. Amongst systems that do care about both the necessity of explicit markup
for each can vary. In cases were we have the text we can deduce the script,
often with a 100% degree of accuracy.

c. User-selection is likely to often separate language and script. Partly
because this is how we normally talk about these in natural language, and
partly because many users would have one of these "fixed" (e.g. there are
many people who use several Western European languages that all use Latin
script, there are many people who use only Chinese but in a variety of

3. Systems that do require both pieces of information may have to
interoperate with systems that, for one reason or another, only supply one.
It will be easier to indicate which piece of information is missing (or
where an assumption has been made) if the format used separates them.

4. The two, while separate, can easily be combined. Conversion between a
format along the lines of zh-HK;Hant and something that treats the language
and script as a single atomic "locale" is trivial. Those that require such a
combined "locale" (for want of a better word) haven't demonstrated why this
is of less use than the proposed zh-Hant-HK format.

5. The little-endian/big-endian debate about where the script code should go
in a language code will be pre-empted if there was separation.

6. While questions about other i18n issues (date formats, currency
placement, decimal separators etc.) have justifiably been dismissed from
this discussion I think it is inevitable that they will be raised again. A
format that contains both language and script information, but with a clear
separation between each, could, with a little foresight, give us a framework
for dealing with these issues.

7. The issues for the continuing administration of language and script tags
are likely to differ. The degree of interest that leads to people getting
involved in such work may also vary amongst many individuals. The external
changes (such as ISO activity) that will need to be reacted to will differ.
These differences will more easily be dealt with if they can be dealt with
with a strong degree of autonomy between the two. That autonomy will more
easily be provided by a format that clearly defines the separation between
the two.

More information about the Ietf-languages mailing list