Azeri [Re: LANGUAGE TAG REGISTRATION FORM]

Fri Apr 11 23:54:46 CEST 2003

In message <p05200a24babcc8124122@[195.218.107.108]> Michael Everson writes:

Mark wrote:

> >2. The assumption that software requirements should have no
> >influence on RFC 3066 registrations. If that is the position of the
> >IETF, Harald can let me know right now, and I won't bother pursuing
> >this issue.

I think Mark is right to pursue this. Software requirements are not
limited to locale considerations. It can be something to enable
something happening, which does not go as far as that.

I think Michael is being too restrictive in disallowing some
possibilities. Could be he's making sure that we keep rigorous in
defining our needs, but I think we are past that stage already, and
that Michael should give in.

Michael - if RFC 3066 doesn't allow what users and developers need,
then they will start to bypass it and find other ways around it, to
actually serve their needs.

> Martin has asked for specifics about the software "requirements".
> 
> My view is that 639 and SIL and 3066 provide codes that refer to 
> languages, not to locale bundles. That is what they do. That is what 
> they are there to do. If locale bundlers want to use codes, they 
> should. But they should not force us to register things that don't 
> exist.

Regardless of locales, the 3-letter identifiers in ISO 639, and SIL,
and the 3-or-more-letter identifiers in RFC 3066, provide
different things to each other (though ISO 639-2 and SIL have more or
less identical purposes).

However, while RFC 3066 allows 2-letter and 3-letter language
identifiers (e.g. az from ISO 639-1), it ALSO allows additional
things: the denotation of those languages in specific situations
(e.g. az-IR, az-Latn, az-Arab, or it will also allow the last two in
its next version, most likely).

These last examples are different from just languages.

Locale bundles, as you call them above are yet a different animal
again, and I'm avoiding going into that too.

I think you need to allow for a lot more specificity to be used,
where the need arises.

There is also the need to say what various "shortest possible forms"
are generally used for. For example, does az alone generally mean:
(a) Azerbaijani in general, and also
(b) Azerbaijani, in Cyrillic script, in Azerbaijan, which has been
    its predominant use for all of the twentieth century?

Or does it mean something else (i.e. official current use, even
though there's still a lot published just in Cyrillic?)

And what would be the expectation of an HTML page which just listed
az as a language code, and didn't supply any character set
information, but just used an 8-bit single byte character set?

What would you, or a system, expect to see, and why?

> I said, Mark, that it is not for language codes to make all of these 
> distinctions. I never said that distinctions of various kinds did not 
> need to be made.

But _ISO_639_ is the language code mechanism. _RFC_3066_ includes other
concepts too, as noted above.

Best regards

John Clews

--
John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Tel:    +44 1423 888 432
mobile: +44 7766 711 395
Email:  Scripts2 at sesame.demon.co.uk
Web:    http://www.keytempo.com

Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes