!!!!! Re: What RFC 3066 says !!!!!
Scripts2 at sesame.demon.co.uk
Wed May 28 18:28:29 CEST 2003
Hi Peter - some slight apologies from me.
In <OF8189963B.D67D9118-ON86256D34.0045D0F4-86256D34.0046CF2D at sil.org>
Peter_Constable at sil.org writes:
> John Clews wrote on 05/28/2003 01:11:37 AM:
> > Just read RFC 3066, as that's your basic text.
> > It's fairly obvious. The ideal would be zh-Hans-SG, and not zh-SG-Hans.
> I don't see how that is obvious.
Alright, it's my personal preference rather than self-evident.
I own up to that. Slight apologies for sleight of hand perhaps.
Howver, I would happily write a document backing that position when
RFC 3066 needs to be rewritten. In that timescale, I have more urgent
things to do today.
I also wrote (and here I apologise even more):
> > "In the first subtag:
> > - All 2-letter codes are interpreted as ISO 3166 alpha-2
> !!!!! What document are you reading? !!!!!
> !!!!! The text you are quoting is not found in the document at
> http://www.ietf.org/rfc/rfc3066.txt?number=3066. !!!!!
Whoops! I was a bit sloppy today, in this point as well.
This bit was actually unintentional.
I was looking at a collection of RFCs, and clearly I picked out
RFC1766 instead of looking at RFC 3066, on that particular occasion.
I think it's the first time I have done that: however,
(NB): it doesn't invalidate the basic points I made earlier.
These earlier points still stand - I checked the text and the syntax
of both RFC 1766 and RFC 3066 in detail, by making several detailed
comparisons of the text of RFC1766 and RFC 3066. Apart from a lot of
rewording for _clarification_ (though RFC 1766, being shorter, is
better for _clarity_ in my view :-) the changes are not as many as
you would think, in the section you quote, Peter.
I make brief notes below with >>>> in the margin.
RFC 3066 says (with some bits left out below where essentially text
is duplicated in each, even if with different words or section numbers):
2.1 Language tag syntax
The language tag is composed of one or more parts: A primary language
subtag and a (possibly empty) series of subsequent subtags.
>>>> RFC 3066 talks about the primary subtag and second subtag;
>>>> RFC 1766 talked about the tag and subtag.
>>>> RFC 3066 is much clearer, though the effect is identical.
The syntax of this tag in ABNF [RFC 2234] is:
Language-Tag = Primary-subtag *( "-" Subtag )
Primary-subtag = 1*8ALPHA
Subtag = 1*8(ALPHA / DIGIT)
All tags are to be treated as case insensitive; there exist
conventions for capitalization of some of them, but these should not
be taken to carry meaning.
2.2 Language tag sources
The namespace of language tags is administered by the Internet
Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
in section 3 of this document.
The following rules apply to the primary subtag:
- All 2-letter subtags are interpreted according to assignments found
in ISO standard 639, "Code for the representation of names of
languages" [ISO 639], or assignments subsequently made by the ISO
639 part 1 maintenance agency or governing standardization bodies.
- The value "i" is reserved for IANA-defined registrations
- The value "x" is reserved for private use. Subtags of "x" shall
not be registered by the IANA.
>>>> Clearly the paragraph below is a major addition in RFC 3066:
- All 3-letter subtags are interpreted according to assignments found
in ISO 639 part 2, "Codes for the representation of names of
languages -- Part 2: Alpha-3 code [ISO 639-2]", or assignments
subsequently made by the ISO 639 part 2 maintenance agency or
governing standardization bodies.
- Other values shall not be assigned except by revision of this
The following rules apply to the second subtag:
[See also my note 1]
- All 2-letter subtags are interpreted as ISO 3166 alpha-2 country
codes from [ISO 3166], or subsequently assigned by the ISO 3166
maintenance agency or governing standardization bodies, denoting
the area to which this language variant relates.
- Tags with second subtags of 3 to 8 letters may be registered with
IANA, according to the rules in chapter 5 of this document.
- Tags with 1-letter second subtags may not be assigned except after
revision of this standard.
There are no rules apart from the syntactic ones for the third and
Tags constructed wholly from the codes that are assigned
interpretations by this chapter do not need to be registered with
IANA before use.
>>>> Here it is worth comparing RFC 3066 and RFC 1766.
>>>> First of all: RFC 3066 (continued)
The information in a subtag may for instance be:
- Country identification, such as en-US (this usage is described in
- Dialect or variant information, such as en-scouse
- Languages not listed in ISO 639 that are not variants of any listed
language, which can be registered with the i-prefix, such as i-tsolyani
- Region identification, such as sgn-US-MA (Martha's Vineyard Sign
Language, which is found in the state of Massachusetts, US)
[See also my note 2]
In RFC 1766, the equivalent text is:
The information in the subtag may for instance be:
- Country identification, such as en-US (this usage is
described in ISO 639)
- Dialect or variant information, such as no-nynorsk or en-
- Languages not listed in ISO 639 that are not variants of
any listed language, which can be registered with the i-
prefix, such as i-cherokee
- Script variations, such as az-arabic and az-cyrillic
>>>> Clearly most of the examples are similar, though a sign
language appears as an example in RFC 3066, and a script
example is removed from RFC 1766 in effect.
However, despite their being no example, it is clear that
(a) second subtags representing scripts are still permissible, and
(b) there is nothing that prevents script preceding country.
Notes by John Clews:
 In fact all the examples listed relate to the _second_ subtag, as
they also clearly did in RFC 1766. This probably just slipped in
during the wordsmithing from "tag" and "subtag" in RFC 1766, where
"primary tag" and "second subtag" replaced these in RFC 3066, but
in this case they got clearly got missed, and that wording is now
This wording should certainly be tightened up in any replacement
of RFC 3066.
 In passing, the example of Martha's Vineyard (or at least
subsequent registrations in this manner) might be better dealt
with by using Uited Nations Locodes (Location codes) as tags,
instead of the ISO 3166-2 codes, giving more specificity (the
above example would then instead be sgn-USMVY).
If anybody wants to check out Locodes, try Googling for
UN/ECE RECOMMENDATION 16: UN/LOCODE (UNITED NATIONS CODE FOR
TRADE AND TRANSPORT LOCATIONS)
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Tel: +44 1423 888 432
mobile: +44 7766 711 395
Email: Scripts at sesame.demon.co.uk
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes
More information about the Ietf-languages