What RFC 3066 says

Wed May 28 07:11:37 CEST 2003

What RFC 3066 says

On Monday 26 May 2003, Michael Everson <everson at evertype.com> wrote
via ietf-languages at iana.org, Re: "Timetable for action: May 31 is suggested:"

> We *DO* need to have a policy on these matters. We *DO* need to
> have a consensus decision on syntax. Is it to be zh-Hans-SG or
> zh-SG-Hans? Do you know? Do you care? Does it matter?

Just read RFC 3066, as that's your basic text.

It's fairly obvious. The ideal would be zh-Hans-SG, and not zh-SG-Hans.

ISO 15924 didn't properly exist when RFC3066 was written. There was
no way it could refer to script codes really. So it _may_ need to be
replaced/superseded by another RFC which does this.

However, what is being requested, AND zh-Hans-SG and all the rest,
STILL fit into the current RFC 3066, so it probably doesn't need
rewriting.

Country codes from ISO 3166 are _often_ the first subtag (i.e. the one
following the language code from ISO 639) but they _don't_have_to_be._
They can also be "Script variations, such as az-arabic and az-cyrillic"
as RFC 3066 notes (end of page 2, begining of page 3):

   "In the first subtag:

    -    All 2-letter codes are interpreted as ISO 3166 alpha-2
         country codes denoting the area in which the language is
         used.

    -    Codes of 3 to 8 letters may be registered with the IANA by
         anyone who feels a need for it, according to the rules in
         chapter 5 of this document.

   The information in the subtag may for instance be:

    -    Country identification, such as en-US (this usage is
         described in ISO 639)

    -    Dialect or variant information, such as no-nynorsk or en-cockney

    -    Languages not listed in ISO 639 that are not variants of
         any listed language, which can be registered with the i-
         prefix, such as i-cherokee

    -    Script variations, such as az-arabic and az-cyrillic

[Meaningful pause while users of this email list contemplate
(a) RFC 3066's development, with az-arabic and az-cyrillic examples;
(b) the Language Tag Reviewer's reluctance to allocate the tags
    az-Latn, az-Arab, and az-Cyrl ...

but we move on: RFC 3066 continues:

   "In the second and subsequent subtag, any value can be registered."

The second and subsequent subtags obviously CAN include country codes
from ISO 3166, if they haven't been used in earlier subtags (and
sometimes they can even then).

This doesn't in any way contradict the existing allocations of

 de-1901    German, traditional orthography
 de-1996    German, orthography of 1996
 de-AT-1901 German, Austrian variant, traditional orthography
 de-AT-1996 German, Austrian variant, orthography of 1996
 de-CH-1901 German, Swiss variant, traditional orthography
 de-CH-1996 German, Swiss variant, orthography of 1996
 de-DE-1901 German, German variant, traditional orthography
 de-DE-1996 German, German variant, orthography of 1996

What has happened here is that a usually used script -Latn
is assumed, and made implicit, and not used as a first subtag.
Nevertheless, a valid RFC 3066 tag of

de-Latn-DE-1996 would still be valid for
German, German variant, orthography of 1996,
in terms of the syntax.

It's just that nobody has requested it, but it would still be valid,
if it was, in the syntax of the RFC.

That's the type of edge case you have been making the rest depend on,
though in fact there _aren't_ these dependencies.

So if you need a citation order for tags which have not yet been
requested, the obvious one would appear to be
language-script-country-date.

Perhaps you might note that as a workable citation order for future
use. Are there any problems with that? Not that I want to dwell on
that aspect, so let's move on.

In passing, there _may_ be a valid reason for excluding "extreme
default script" tags (i.e. extreme ones which cannot be found
outside of fairly short transliteration examples) but that should NOT
prevent you from allocating "non-default script" tags, of the type
that Mark requested (which have at least some body of literature
behind them).

Again, I don't want to dwell on that aspect either, so let's
move on again.

I will add the rest of Mark's requests at the end, below: you'll see
that they don't contradict the syntax described above, nor do they
stray into the realm of "default script" tags, or extend into the
extended citation order above.

You also wrote:

> And to Addison: the palpable thing you're looking for is that these
> proposals are crap, they are hacks for a specific locale tagging
> system which itself ought to have been rethought and rewritten, and
> now it's dumped on this poor RFC.

And I wrote I reply, earlier:

> And the RFC specifies it. Please read it.

I wish you _had_ read it. It might have saved us all a lot of time,
including you.

And gratuitous insults certainly don't help your case, so I'd also
urge you to give up on those too, please.

What people use the tags for is up to them. It's they that have to
live with the consequences, not you as Language Tag Reviewer.

> And to John Clews: Your Reviewer has about a week in Dublin before
> he has to got to Baltimore for the greater glory of encoding
> Cuneiform, and then when he gets back he goes to Oxford for the
> greater glory of encoding medieval weirdo Latin letters, and he is
> sure that nothing is going to happen between now and the middle of
> June ...

Well, it appears from subsequent emails (from people other than me)
that unless you get your act together and allocate the requested
tags, or provide good reasons why not, rather than the previous
hunches etc., something will happen.

So I would urge you to get on with it, and allocate the tags.
You have a deadline rapidly approaching, from oe of Mark's recent
emails.

> which gives Mark and the rest of you PLENTY of time to talk to
> Peter Edberg and Ken and Harald and come up with a MATURE position
> and policy document, so that Your Reviewer doesn't have to be BLAMED
> when he balks at encoding things that he thinks are dodgy.

RFC 3066 is actually very clear.

And if you allocate them it's clear that you won't be blamed.
And if you do _not_ allocate them it's clear that you will be blamed.

As dates are mentioned in the above text, and I said I'd include
the rest of Mark's requests at the end, here they are.

az-latn; Azeri in Latin script; Azerbaycan dili
az-cyrl; Azeri in Cyrillic script; Azerbaycan
az-arab; Azeri in Arabic script; Azerbaycan

uz-latn; Uzbek in Latin script; U'zbek
uz-cyrl; Uzbek in Cyrillic script; Uzbek

sr-latn; Serbian in Latin script; Srpski
sr-cyrl; Serbian in Cyrillic script; Srpski

zh-hans; simplified Chinese; zhong wen (jian ti zi)
zh-hant; traditional Chinese; zhong wen (fan ti zi)

The date that I have in my own email archive for when they were
submitted by Mark is Wednesday, 30 Apr 2003.

I note that today is Wednesday 28 May 2003.

I'd like to see some new registrations please.

Many thanks in advance

Best regards

John Clews

--
John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Tel:    +44 1423 888 432
mobile: +44 7766 711 395
Email:  Scripts2 at sesame.demon.co.uk
Web:    http://www.keytempo.com

Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes