!!!!! Re: What RFC 3066 says !!!!!

John Clews Scripts2 at sesame.demon.co.uk
Wed May 28 18:28:29 CEST 2003


Hi Peter - some slight apologies from me.

In <OF8189963B.D67D9118-ON86256D34.0045D0F4-86256D34.0046CF2D at sil.org>
Peter_Constable at sil.org writes:

> John Clews wrote on 05/28/2003 01:11:37 AM:
> 
> > Just read RFC 3066, as that's your basic text.
> >
> > It's fairly obvious. The ideal would be zh-Hans-SG, and not zh-SG-Hans.
> 
> I don't see how that is obvious.

Alright, it's my personal preference rather than self-evident.
I own up to that. Slight apologies for sleight of hand perhaps.

Howver, I would happily write a document backing that position when
RFC 3066 needs to be rewritten. In that timescale, I have more urgent
things to do today.

I also wrote (and here I apologise even more):

> >    "In the first subtag:
> >
> >     -    All 2-letter codes are interpreted as ISO 3166 alpha-2
> 
> [snip]
> 
> !!!!! What document are you reading? !!!!!
> 
> !!!!! The text you are quoting is not found in the document at
> http://www.ietf.org/rfc/rfc3066.txt?number=3066. !!!!!

Whoops! I was a bit sloppy today, in this point as well.
This bit was actually unintentional.

I was looking at a collection of RFCs, and clearly I picked out
RFC1766 instead of looking at RFC 3066, on that particular occasion.

I think it's the first time I have done that: however,
(NB): it doesn't invalidate the basic points I made earlier.

These earlier points still stand - I checked the text and the syntax
of both RFC 1766 and RFC 3066 in detail, by making several detailed
comparisons of the text of RFC1766 and RFC 3066. Apart from a lot of
rewording for _clarification_ (though RFC 1766, being shorter, is
better for _clarity_ in my view :-) the changes are not as many as
you would think, in the section you quote, Peter.

I make brief notes below with >>>> in the margin.

RFC 3066 says (with some bits left out below where essentially text
is duplicated in each, even if with different words or section numbers):

2.1 Language tag syntax

   The language tag is composed of one or more parts: A primary language
   subtag and a (possibly empty) series of subsequent subtags.

>>>>    RFC 3066 talks about the primary subtag and second subtag;
>>>>    RFC 1766 talked about the tag and subtag.
>>>>    RFC 3066 is much clearer, though the effect is identical.

   The syntax of this tag in ABNF [RFC 2234] is:

    Language-Tag = Primary-subtag *( "-" Subtag )

    Primary-subtag = 1*8ALPHA

    Subtag = 1*8(ALPHA / DIGIT)

   All tags are to be treated as case insensitive; there exist
   conventions for capitalization of some of them, but these should not
   be taken to carry meaning.

2.2 Language tag sources

   The namespace of language tags is administered by the Internet
   Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
   in section 3 of this document.

   The following rules apply to the primary subtag:

   - All 2-letter subtags are interpreted according to assignments found
     in ISO standard 639, "Code for the representation of names of
     languages" [ISO 639], or assignments subsequently made by the ISO
     639 part 1 maintenance agency or governing standardization bodies.

   - The value "i" is reserved for IANA-defined registrations

   - The value "x" is reserved for private use.  Subtags of "x" shall
     not be registered by the IANA.

>>>>    Clearly the paragraph below is a major addition in RFC 3066:

   - All 3-letter subtags are interpreted according to assignments found
     in ISO 639 part 2, "Codes for the representation of names of
     languages -- Part 2: Alpha-3 code [ISO 639-2]", or assignments
     subsequently made by the ISO 639 part 2 maintenance agency or
     governing standardization bodies.

   - Other values shall not be assigned except by revision of this
     standard.

   The following rules apply to the second subtag:

   [See also my note 1]

   - All 2-letter subtags are interpreted as ISO 3166 alpha-2 country
     codes from [ISO 3166], or subsequently assigned by the ISO 3166
     maintenance agency or governing standardization bodies, denoting
     the area to which this language variant relates.

   - Tags with second subtags of 3 to 8 letters may be registered with
     IANA, according to the rules in chapter 5 of this document.

   - Tags with 1-letter second subtags may not be assigned except after
     revision of this standard.

   There are no rules apart from the syntactic ones for the third and
   subsequent subtags.

   Tags constructed wholly from the codes that are assigned
   interpretations by this chapter do not need to be registered with
   IANA before use.

>>>>    Here it is worth comparing RFC 3066 and RFC 1766.
>>>>    First of all: RFC 3066 (continued)

   The information in a subtag may for instance be:

   - Country identification, such as en-US (this usage is described in
     ISO 639)

   - Dialect or variant information, such as en-scouse

   - Languages not listed in ISO 639 that are not variants of any listed
     language, which can be registered with the i-prefix, such as i-tsolyani

   - Region identification, such as sgn-US-MA (Martha's Vineyard Sign
     Language, which is found in the state of Massachusetts, US)

     [See also my note 2]


----------------------------------------------------------------------

In RFC 1766, the equivalent text is:

   The information in the subtag may for instance be:

    -    Country identification, such as en-US (this usage is
         described in ISO 639)

    -    Dialect or variant information, such as no-nynorsk or en-
         cockney

    -    Languages not listed in ISO 639 that are not variants of
         any listed language, which can be registered with the i-
         prefix, such as i-cherokee

    -    Script variations, such as az-arabic and az-cyrillic

>>>>    Clearly most of the examples are similar, though a sign
        language appears as an example in RFC 3066, and a script
        example is removed from RFC 1766 in effect.

However, despite their being no example, it is clear that
(a) second subtags representing scripts are still permissible, and
(b) there is nothing that prevents script preceding country.

John

----------------------------------------------------------------------
Notes by John Clews:

[1] In fact all the examples listed relate to the _second_ subtag, as
    they also clearly did in RFC 1766. This probably just slipped in
    during the wordsmithing from "tag" and "subtag" in RFC 1766, where
    "primary tag" and "second subtag" replaced these in RFC 3066, but
    in this case they got clearly got missed, and that wording is now
    ambiguous.

    This wording should certainly be tightened up in any replacement
    of RFC 3066.


[2] In passing, the example of Martha's Vineyard (or at least
    subsequent registrations in this manner) might be better dealt
    with by using Uited Nations Locodes (Location codes) as tags,
    instead of the ISO 3166-2 codes, giving more specificity (the
    above example would then instead be sgn-USMVY).

    If anybody wants to check out Locodes, try Googling for
    UN/ECE RECOMMENDATION 16: UN/LOCODE (UNITED NATIONS CODE FOR
    TRADE AND TRANSPORT LOCATIONS)

----------------------------------------------------------------------


--
John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Tel:    +44 1423 888 432
mobile: +44 7766 711 395
Email:  Scripts at sesame.demon.co.uk
Web:    http://www.keytempo.com

Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes


More information about the Ietf-languages mailing list