ISO 639 and other language identifiers

Håvard Hjulstad havard@hjulstad.com
Mon, 6 May 2002 11:07:34 +0200


There has been some discussion lately on "mass encoding" of languages and
the relation between ISO 639, IETF language coding, SIL Ethnologue, and
others. I am project editor of ISO 639-1 and acting chairman of the ISO 639
Registration Authorities' Joint Advisory Committee (JAC). I am also convener
of ISO/TC37/SC2/WG1, and in that capacity I am working on a proposal
concerning ISO language coding.

ISO 639-1 (alpha-2 identifiers; currently FDIS, to replace the 1988 edition)
and ISO 639-2 (alpha-3 identifiers, published 1998) are maintained by the
Registration Authorities. New language identifiers are approved by the JAC.
For information and to view the latest tables, please see
http://lcweb.loc.gov/standards/iso639-2/iso639jac.html. Some documents are
also available at http://www.rtt.org/ISO/TC37/SC2/WG1/. The printed
standards will never be up-to-date; the lists on the web are the official
lists.

Submissions to JAC should be done using the submission form under the JAC
home page (http://lcweb.loc.gov/standards/iso639-2/iso639jac.html). The JAC
needs to process one language at a time, and one sumbission should be made
for each individual language. However, please feel free to submit
information in other forms as well, e.g. by email to me. You may wish to
make a reference to that submission on the form.

ISO 639 alpha-2 and alpha-3 will remain "conservative". We will not add new
language identifiers unless it has been demonstrated that the requirements
for inclusions in ISO 639 are met. We also need to have all relevant
information about language names (English names, French names, indigenous
names (in Romanized and in indigenous form)), geographical area of use,
linguistic classification, status, evidence for the existence of documents,
etc. There is normally some correspondence with the submitter to clarify all
this.

It has been suggested, and this is one of the proposals that I am working
with myself, to add another part to ISO 639. The registration of new
languages in this part would be somewhat faster, and the language
identifiers would have a slightly different "status" in relation to
standardization. However, even this registry would require procedures and
principles. The verification of the relyability of submitted information
will be the key. We have referred to this as an "alpha-4 code", but the
exact format has not been finalized.

Another proposal for ISO 639 additions relate to "language identification
extension mechanisms". The combination of "simple" language identifiers with
geographical information, temporal information, script/orthography, etc. is
a very complex issue.

ISO/TC37 wants to develop ISO 639 to make it even more useful for all our
users. ISO 639 has many users and uses, including library catalogues,
scholarly linguistic applications, automatic language identification, etc.
etc. It isn't necessarily simple to make all users equally satisfied. But we
will move forward ...

... provided that we can find some sort of project funding. These projects
extend far beyond what can be achieved with traditional "voluntary" work.

Best regards
Håvard Hjulstad
(chairman of ISO/TC37; convener of ISO/TC37/SC2/WG1)

-------------------------
Håvard Hjulstad    mailto:havard@hjulstad.com
  Solfallsveien 31
  NO-1430  Ås, Norway
  tel: +47-64944233  &  +47-64963684
  mob: +47-90145563
  http://www.hjulstad.com/havard/
-------------------------