Summary of discussion so far on script tags in language tags

Harald Tveit Alvestrand harald at
Mon Apr 14 12:58:44 CEST 2003

Having spent several hours trying to catch up with the message flow on this 
list, I'll try to summarize the discussion so far...

THE ISSUE: Script information as part of a language tag

QUESTION 1: Should language tags that differ only/mainly in script be 

YES: Mark Davis and others
  - The script is often very important to distinguish between an acceptable
    and an unacceptable variant of a text, which is a common current
    usage of language tags
  - The script sometimes aligns more closely with other distinctions than
    other distinguishing features (such as "country")
  - Hierarchic left-hand substring match fallback will often give sensible
    results when user preferences are stated in terms of language without
    script, so the introduction "does no harm"
  - Other systems use script as part of their internal identifiers, so
    it must also be present in 3066 if equivalence is to be maintained

NO: Jon Hanna and others
  - Script is not language. It is an orthogonal feature, and should be
    independently represented.
  - In the case of text-present, script can easily be identified by looking
    at the characters used, so it's not needed

QUESTION 2: If script difference is allowed, how should they be tagged?

Everyone seems to agree that ISO 15924 is the right source of tags for 
scripts, although there is some debate engendered from the fact that one 
has not agreed to encode "traditional" and "simplified" Chinese as separate 

Everyone also seems to agree that "lang-script" makes sense in many cases - 
ISO 639 code + ISO 15924 code. The more interesting question is when both 
country info and script info is needed.

PROPOSAL 1: Lang-Country-Script
  - This is a natural extension of RFC 3066
  - This provides the right fallback if country variant is more important
    than script variant

PROPOSAL 2: Lang-Script-Country
  - This provides the right fallback if script variant is more important
    than country variant

QUESTION 3: If script difference is allowed, and the choice of question 2 
is settled, should the tags be generative or registration-only?

  - No need for pre-registration
  - All combinations of lang + script that can be generated by other
    systems such as MS-Windows have a natural mapping
  - Follows the pattern of the lang + country generative mechanism of
    ISO 639 / RFC 3066

  - Generative needs a revision or addendum to 3066 to come into existence
  - There are only about 24 interesting combinations anyway
  - Lots of the combinations would be meaningless, and lots would be
    effective duplicates. Dupes make recipient's task harder.
  - The lang+country generative mess only shows that we should not do
    this again.

If the mechanism picked by this list can't be used to let language tags 
distinguish between Traditional and Simplified Han, have we solved the 

YES - the problem needs solving for Azeri, Serbian and so on
NO - unless we fix the Chinese problem, the solution is unacceptable

A number of other questions, including the need for databases of language 
information for all the (non)hierarchical relationships that language tags 
do NOT capture, the actual status of TC/SC, whether "locale" is an useless 
concept, and the writing traditions of Azerbajan have been touched on in 
the debate. But I believe those 4 are the essential ones for this list to 

Suggestion for next steps:

- If you think my summary needs refinement, please reply to this message 
suggesting a change to the text.
- If you want to continue the debate, reply to this or another message, but 
- Once we're pretty certain we have the right set of questions, I'll send 
out a request for a poll on the possible answers. The result will likely be 
a list of NAMES for each alternative, not a count - we're looking for a 
simple way to survey people's opinions, not for anonymous voting!

Your comments?


More information about the Ietf-languages mailing list